Open Tabs
- 085-assignment.ipynb
- 082-test-driven.ipynb
- 083-garch.ipynb
- 084-model-deployment.ipynb
Kernels
- 085-assignment.ipynb
- 081-working-with-apis.ipynb
- 082-test-driven.ipynb
- 083-garch.ipynb
Terminals
- terminals/1
- .ipynb_checkpoints16 hours ago
- images35 minutes ago
- models13 minutes ago
- .env2 days ago
- 081-working-with-apis.ipynb2 days ago
- 082-test-driven.ipynb2 days ago
- 083-garch.ipynb18 hours ago
- 084-model-deployment.ipynb16 hours ago
- 085-assignment.ipynb11 minutes ago
- config.py2 months ago
- data.py2 days ago
- main.py25 minutes ago
- model.py33 minutes ago
- stocks.sqlite29 minutes ago
- Prepare Data
- Import
- Explore
- Split
- Build Model
- Iterate
- Evaluate
- Communicate Results
Usage Guidelines
This lesson is part of the DS Lab core curriculum. For that reason, this notebook can only be used on your WQU virtual machine.
This means:
- ⓧ No downloading this notebook.
- ⓧ No re-sharing of this notebook with friends or colleagues.
- ⓧ No downloading the embedded videos in this notebook.
- ⓧ No re-sharing embedded videos with friends or colleagues.
- ⓧ No adding this notebook to public or private repositories.
- ⓧ No uploading this notebook (or screenshots of it) to other websites, including websites for study resources.
8.5 Volatility Forecasting in South Africa 🇿🇦
In this assignment you'll build a model to predict stock volatility for the telecommunications company MTN Group.
Tip: There are some tasks in this assignment that you can complete by importing functions and classes you created for your app. Give it a try!
Warning: There are some tasks in this assignment where there is an extra code block that will transform your work into a submission that's compatible with the grader. Be sure to run those cells and inspect the submission before you submit to the grader.
xxxxxxxxxx%load_ext autoreload%autoreload 2import wqet_graderfrom arch.univariate.base import ARCHModelResultwqet_grader.init("Project 8 Assessment")The autoreload extension is already loaded. To reload it, use: %reload_ext autoreload
# Import your libraries hereimport pandas as pdimport numpy as npimport requestsimport sqlite3import matplotlib.pyplot as pltfrom arch import arch_modelfrom config import settingsfrom data import SQLRepositoryfrom statsmodels.graphics.tsaplots import plot_acf, plot_pacfWorking with APIs¶
Task 8.5.1: Create a URL to get all the stock data for MTN Group ("MTNOY") from AlphaVantage in JSON format. Be sure to use the https://learn-api.wqu.edu hostname. And don't worry: your submission won't include your API key!
ticker = "MTNOY"output_size = "full"data_type = "json"url = ( "https://learn-api.wqu.edu/1/data-services/alpha-vantage/query?" "function=TIME_SERIES_DAILY&" f"symbol={ticker}&" f"outputsize={output_size}&" f"datatype={data_type}&" f"apikey={settings.alpha_api_key}")print("url type:", type(url))url# Remove API key for submissionsubmission_851 = url[:170]submission_851wqet_grader.grade("Project 8 Assessment", "Task 8.5.1", submission_851)Python master 😁
Score: 1
Task 8.5.2: Create an HTTP request for the URL you created in the previous task. The grader will evaluate your work by looking at the ticker symbol in the "Meta Data" key-value pair in your response.
xxxxxxxxxxresponse = requests.get(url)print("response type:", type(response))response type: <class 'requests.models.Response'>
# Get symbol in `"Meta Data"`submission_852 = response.json()["Meta Data"]["2. Symbol"]submission_852'MTNOY'
wqet_grader.grade("Project 8 Assessment", "Task 8.5.2", submission_852)Wow, you're making great progress.
Score: 1
Task 8.5.3: Get status code of your response and assign it to the variable response_code.
response_code = response.status_codeprint("code type:", type(response_code))response_codecode type: <class 'int'>
200
wqet_grader.grade("Project 8 Assessment", "Task 8.5.3", response_code)Wow, you're making great progress.
Score: 1
Test-Driven Development¶
Task 8.5.4: Create a DataFrame df_mtnoy with all the stock data for MTN. Make sure that the DataFrame has the correct type of index and column names. The grader will evaluate your work by looking at the row in df_mtnoy for 6 December 2021.
xxxxxxxxxxresponse_data = response.json()stock_data = response_data["Time Series (Daily)"]df_mtnoy = pd.DataFrame.from_dict(stock_data,orient="index",dtype = "float")df_mtnoy.index= pd.to_datetime(df_mtnoy.index)df_mtnoy.index.name = "date"df_mtnoy.columns = [c.split(". ")[1] for c in df_mtnoy.columns]print("df_mtnoy type:", type(df_mtnoy))df_mtnoy.head()df_mtnoy type: <class 'pandas.core.frame.DataFrame'>
| open | high | low | close | volume | |
|---|---|---|---|---|---|
| date | |||||
| 2023-01-26 | 8.255 | 8.2950 | 8.248 | 8.295 | 3462.0 |
| 2023-01-25 | 8.100 | 8.1425 | 8.075 | 8.075 | 28029.0 |
| 2023-01-24 | 8.030 | 8.0850 | 8.020 | 8.020 | 7391.0 |
| 2023-01-23 | 7.890 | 8.0265 | 7.890 | 7.980 | 16090.0 |
| 2023-01-20 | 7.810 | 7.9300 | 7.810 | 7.930 | 10861.0 |
xxxxxxxxxx# Get row for 6 Dec 2021submission_854 = df_mtnoy.loc["2021-12-06"].to_frame().Tsubmission_854| open | high | low | close | volume | |
|---|---|---|---|---|---|
| 2021-12-06 | 10.16 | 10.18 | 10.11 | 10.11 | 13542.0 |
xxxxxxxxxxwqet_grader.grade("Project 8 Assessment", "Task 8.5.4", submission_854)Way to go!
Score: 1
Task 8.5.5: Connect to the database whose name is stored in the .env file for this project. Be sure to set the check_same_thread argument to False. Assign the connection to the variable connection. The grader will evaluate your work by looking at the database location assigned to connection.
xxxxxxxxxxconnection = sqlite3.connect(database= settings.db_name, check_same_thread = False)connection<sqlite3.Connection at 0x7f764e6457b0>
xxxxxxxxxx# Get location of database for `connection`submission_855 = connection.cursor().execute("PRAGMA database_list;").fetchall()[0][-1]submission_855'/home/jovyan/work/ds-curriculum/080-volatility-forecasting-in-india/stocks.sqlite'
xxxxxxxxxxwqet_grader.grade("Project 8 Assessment", "Task 8.5.5", submission_855)Very impressive.
Score: 1
Task 8.5.6: Insert df_mtnoy into your database. The grader will evaluate your work by looking at the first five rows of the MTNOY table in the database.
xxxxxxxxxx# Insert `MTNOY` data into databaserepo = SQLRepository(connection =connection)repo.insert_table(table_name="MTNOY", records=df_mtnoy, if_exists="replace"){'transaction_successful': True, 'records_inserted': 3913}xxxxxxxxxx# Get first five rows of `MTNOY` tablesubmission_856 = pd.read_sql(sql="SELECT * FROM MTNOY LIMIT 5", con=connection)submission_856| date | open | high | low | close | volume | |
|---|---|---|---|---|---|---|
| 0 | 2023-01-26 00:00:00 | 8.255 | 8.2950 | 8.248 | 8.295 | 3462.0 |
| 1 | 2023-01-25 00:00:00 | 8.100 | 8.1425 | 8.075 | 8.075 | 28029.0 |
| 2 | 2023-01-24 00:00:00 | 8.030 | 8.0850 | 8.020 | 8.020 | 7391.0 |
| 3 | 2023-01-23 00:00:00 | 7.890 | 8.0265 | 7.890 | 7.980 | 16090.0 |
| 4 | 2023-01-20 00:00:00 | 7.810 | 7.9300 | 7.810 | 7.930 | 10861.0 |
xxxxxxxxxxwqet_grader.grade("Project 8 Assessment", "Task 8.5.6", submission_856)Party time! 🎉🎉🎉
Score: 1
Task 8.5.7: Read the MTNOY table from your database and assign the output to df_mtnoy_read. The grader will evaluate your work by looking at the row for 27 April 2022.
xxxxxxxxxxdf_mtnoy_read = repo.read_table(table_name="MTNOY")print("df_mtnoy_read type:", type(df_mtnoy_read))print("df_mtnoy_read shape:", df_mtnoy_read.shape)df_mtnoy_read.head()df_mtnoy_read type: <class 'pandas.core.frame.DataFrame'> df_mtnoy_read shape: (3913, 5)
| open | high | low | close | volume | |
|---|---|---|---|---|---|
| date | |||||
| 2023-01-26 | 8.255 | 8.2950 | 8.248 | 8.295 | 3462.0 |
| 2023-01-25 | 8.100 | 8.1425 | 8.075 | 8.075 | 28029.0 |
| 2023-01-24 | 8.030 | 8.0850 | 8.020 | 8.020 | 7391.0 |
| 2023-01-23 | 7.890 | 8.0265 | 7.890 | 7.980 | 16090.0 |
| 2023-01-20 | 7.810 | 7.9300 | 7.810 | 7.930 | 10861.0 |
xxxxxxxxxx# Get row for 27 April 2022submission_857 = df_mtnoy_read.loc["2022-04-27"].to_frame().Tsubmission_857| open | high | low | close | volume | |
|---|---|---|---|---|---|
| 2022-04-27 | 10.71 | 10.85 | 10.5 | 10.65 | 23927.0 |
xxxxxxxxxxwqet_grader.grade("Project 8 Assessment", "Task 8.5.7", submission_857)Excellent! Keep going.
Score: 1
Predicting Volatility¶
Prepare Data¶
Task 8.5.8: Create a Series y_mtnoy with the 2,500 most recent returns for MTN. The grader will evaluate your work by looking at the volatility for 9 August 2022.
xxxxxxxxxxdf = repo.read_table(table_name = "MTNOY", limit=2500+1)df.sort_index(inplace =True)df["return"] = df["close"].pct_change() * 100y_mtnoy = df["return"].dropna()print("y_mtnoy type:", type(y_mtnoy))print("y_mtnoy shape:", y_mtnoy.shape)y_mtnoy.head()y_mtnoy type: <class 'pandas.core.series.Series'> y_mtnoy shape: (2500,)
date 2013-02-22 -0.970874 2013-02-25 -1.176471 2013-02-26 0.694444 2013-02-27 -0.492611 2013-02-28 -2.722772 Name: return, dtype: float64
xxxxxxxxxx# Get data for 8 Aug 2022submission_859 = float(y_mtnoy["2022-08-09"])submission_8591.5783540022547893
xxxxxxxxxxwqet_grader.grade("Project 8 Assessment", "Task 8.5.8", submission_859)Yes! Your hard work is paying off.
Score: 1
Task 8.5.9: Calculate daily volatility for y_mtnoy, and assign the result to mtnoy_daily_volatility.
xxxxxxxxxxmtnoy_daily_volatility = y_mtnoy.std()print("mtnoy_daily_volatility type:", type(mtnoy_daily_volatility))print("MTN Daily Volatility:", mtnoy_daily_volatility)mtnoy_daily_volatility type: <class 'float'> MTN Daily Volatility: 2.914284074738285
xxxxxxxxxxwqet_grader.grade("Project 8 Assessment", "Task 8.5.9", mtnoy_daily_volatility)Wow, you're making great progress.
Score: 1
Task 8.5.10: Calculate the annual volatility for y_mtnoy, and assign the result to mtnoy_annual_volatility.
xxxxxxxxxxmtnoy_annual_volatility = mtnoy_daily_volatility * np.sqrt(252)print("mtnoy_annual_volatility type:", type(mtnoy_annual_volatility))print("MTN Annual Volatility:", mtnoy_annual_volatility)mtnoy_annual_volatility type: <class 'numpy.float64'> MTN Annual Volatility: 46.26282546932085
xxxxxxxxxxwqet_grader.grade("Project 8 Assessment", "Task 8.5.10", float(mtnoy_annual_volatility))You = coding 🥷
Score: 1
Task 8.5.11: Create a time series line plot for y_mtnoy. Be sure to label the x-axis "Date", the y-axis "Returns", and use the title "Time Series of MTNOY Returns".
xxxxxxxxxx# Create `fig` and `ax`fig, ax = plt.subplots(figsize=(15, 6))# Plot `y_mtnoy` on `ax`y_mtnoy.plot(ax=ax)# Add axis labelsplt.xlabel("Date")plt.ylabel("Returns")# Add titleplt.title("Time Series of MTNOY Returns")# Don't delete the code below 👇plt.savefig("images/8-5-11.png", dpi=150)xxxxxxxxxxwith open("images/8-5-11.png", "rb") as file: wqet_grader.grade("Project 8 Assessment", "Task 8.5.11", file)Awesome work.
Score: 1
Task 8.5.12: Create an ACF plot of the squared returns for MTN. Be sure to label the x-axis "Lag [days]", the y-axis "Correlation Coefficient", and use the title "ACF of MTNOY Squared Returns".
x
# Create `fig` and `ax`fig, ax = plt.subplots(figsize=(15, 6))# Create ACF of squared returnsplot_acf(y_mtnoy **2, ax=ax);# Add axis labelsplt.xlabel("Lag [days]")plt.ylabel("Correlation Coefficient")# Add titleplt.title("ACF of MTNOY Squared Returns")# Don't delete the code below 👇plt.savefig("images/8-5-12.png", dpi=150)xxxxxxxxxxwith open("images/8-5-12.png", "rb") as file: wqet_grader.grade("Project 8 Assessment", "Task 8.5.12", file)Yes! Great problem solving.
Score: 1
Task 8.5.13: Create a PACF plot of the squared returns for MTN. Be sure to label the x-axis "Lag [days]", the y-axis "Correlation Coefficient", and use the title "PACF of MTNOY Squared Returns".
x
# Create `fig` and `ax`fig, ax = plt.subplots(figsize=(15, 6))# CreateP ACF of squared returnsplot_pacf(y_mtnoy **2, ax=ax);# Add axis labelsplt.xlabel("Lag [days]")plt.ylabel("Correlation Coefficient")# Add titleplt.title("PACF of MTNOY Squared Returns")# Don't delete the code below 👇plt.savefig("images/8-5-13.png", dpi=150)xxxxxxxxxxwith open("images/8-5-13.png", "rb") as file: wqet_grader.grade("Project 8 Assessment", "Task 8.5.13", file)You're making this look easy. 😉
Score: 1
Task 8.5.14: Create a training set y_mtnoy_train that contains the first 80% of the observations in y_mtnoy.
xxxxxxxxxxcutoff_test = int(len(y_mtnoy)*0.8)y_mtnoy_train = y_mtnoy.iloc[:cutoff_test]print("y_mtnoy_train type:", type(y_mtnoy_train))print("y_mtnoy_train shape:", y_mtnoy_train.shape)y_mtnoy_train.head()y_mtnoy_train type: <class 'pandas.core.series.Series'> y_mtnoy_train shape: (2000,)
date 2013-02-22 -0.970874 2013-02-25 -1.176471 2013-02-26 0.694444 2013-02-27 -0.492611 2013-02-28 -2.722772 Name: return, dtype: float64
xxxxxxxxxxwqet_grader.grade("Project 8 Assessment", "Task 8.5.14", y_mtnoy_train)You got it. Dance party time! 🕺💃🕺💃
Score: 1
Build Model¶
Task 8.5.15: Build and fit a GARCH model using the data in y_mtnoy. Try different values for p and q, using the summary to assess its performance. The grader will evaluate whether your model is the correct data type.
xxxxxxxxxx# Build and train modelmodel = arch_model( y_mtnoy_train, p=1, q=1, rescale=False).fit(disp=0)print("model type:", type(model))# Show model summarymodel.summary()model type: <class 'arch.univariate.base.ARCHModelResult'>
| Dep. Variable: | return | R-squared: | 0.000 |
|---|---|---|---|
| Mean Model: | Constant Mean | Adj. R-squared: | 0.000 |
| Vol Model: | GARCH | Log-Likelihood: | -4737.24 |
| Distribution: | Normal | AIC: | 9482.48 |
| Method: | Maximum Likelihood | BIC: | 9504.88 |
| No. Observations: | 2000 | ||
| Date: | Fri, Jan 27 2023 | Df Residuals: | 1999 |
| Time: | 05:34:29 | Df Model: | 1 |
| coef | std err | t | P>|t| | 95.0% Conf. Int. | |
|---|---|---|---|---|---|
| mu | -0.0181 | 5.440e-02 | -0.333 | 0.739 | [ -0.125,8.852e-02] |
| coef | std err | t | P>|t| | 95.0% Conf. Int. | |
|---|---|---|---|---|---|
| omega | 0.1290 | 5.796e-02 | 2.226 | 2.603e-02 | [1.540e-02, 0.243] |
| alpha[1] | 0.0740 | 1.754e-02 | 4.221 | 2.431e-05 | [3.965e-02, 0.108] |
| beta[1] | 0.9124 | 1.925e-02 | 47.394 | 0.000 | [ 0.875, 0.950] |
Covariance estimator: robust
xxxxxxxxxxsubmission_8515 = isinstance(model, ARCHModelResult)submission_8515True
xxxxxxxxxxwqet_grader.grade("Project 8 Assessment", "Task 8.5.15", [submission_8515])Yes! Your hard work is paying off.
Score: 1
Task 8.5.16: Plot the standardized residuals for your model. Be sure to label the x-axis "Date", the y-axis "Value", and use the title "MTNOY GARCH Model Standardized Residuals".
x
# Create `fig` and `ax`fig, ax = plt.subplots(figsize=(15, 6))# Plot standardized residualsmodel.std_resid.plot(ax=ax)# Add axis labelsplt.xlabel("Date")plt.ylabel("Value")# Add titleplt.title("MTNOY GARCH Model Standardized Residuals")# Don't delete the code below 👇plt.savefig("images/8-5-16.png", dpi=150)xxxxxxxxxxwith open("images/8-5-16.png", "rb") as file: wqet_grader.grade("Project 8 Assessment", "Task 8.5.16", file)Awesome work.
Score: 1
Task 8.5.17: Create an ACF plot of the squared, standardized residuals of your model. Be sure to label the x-axis "Lag [days]", the y-axis "Correlation Coefficient", and use the title "ACF of MTNOY GARCH Model Standardized Residuals".
xxxxxxxxxx# Create `fig` and `ax`fig, ax = plt.subplots(figsize=(15, 6))# CreateP ACF of squared returnsplot_pacf(model.std_resid**2, ax=ax);# Add axis labelsplt.xlabel("Lag [days]")plt.ylabel("Correlation Coefficient")# Add titleplt.title("PACF of MTNOY GARCH Model Standardized Residuals")# Don't delete the code below 👇plt.savefig("images/8-5-17.png", dpi=150)xxxxxxxxxxwith open("images/8-5-17.png", "rb") as file: wqet_grader.grade("Project 8 Assessment", "Task 8.5.17", file)You = coding 🥷
Score: 1
Model Deployment¶
Ungraded Task: If it's not already running, start your app server. WQU WorldQuant University Applied Data Science Lab QQQQ
Task 8.5.18: Change the fit method of your GarchModel class so that, when a model is done training, two more attributes are added to the object: self.aic with the AIC for the model, and self.bic with the BIC for the model. When you're done, use the cell below to check your work.
Tip: How can you access the AIC and BIC scores programmatically? Every ARCHModelResult has an .aic and a .bic attribute.
xxxxxxxxxx# Import `build_model` functionfrom main import build_model# Build model using new `MTNOY` datamodel = build_model(ticker="MTNOY", use_new_data=True)# Wrangle `MTNOY` returnsmodel.wrangle_data(n_observations=2500)# Fit GARCH(1,1) model to datamodel.fit(p=1, q=1)# Does model have AIC and BIC attributes?assert hasattr(model, "aic")assert hasattr(model, "bic")xxxxxxxxxx# Put test results into dictionarysubmission_8518 = {"has_aic": hasattr(model, "aic"), "has_bic": hasattr(model, "bic")}submission_8518{'has_aic': True, 'has_bic': True}xxxxxxxxxxwqet_grader.grade("Project 8 Assessment", "Task 8.5.18", submission_8518)Yes! Great problem solving.
Score: 1
Task 8.5.19: Change the fit_model function in the main module so that the "message" it returns includes the AIC and BIC scores. For example, the message should look something like this:
"Trained and saved 'models/2022-10-12T23:10:06.577238_MTNOY.pkl'. Metrics: AIC 9892.184665169907, BIC 9914.588275008075."
When you're done, use the cell below to check your work.
xxxxxxxxxx# Import `FitIn` class and `fit_model` functionfrom main import FitIn, fit_model# Instantiate `FitIn` objectrequest = FitIn(ticker="MTNOY", use_new_data=False, n_observations=2500, p=1, q=1)# Build model and fit to data, following parameters in `request`fit_out = fit_model(request=request)# Inspect `fit_out`fit_out{'ticker': 'MTNOY',
'p': 1,
'q': 1,
'n_observations': 2500,
'use_new_data': False,
'success': True,
'message': 'Trained and save models/2023-01-27T05:48:09.972264_MTNOY.pkl. Metrics: AIC 12006.325788440472, BIC 12029.621972483897.'}xxxxxxxxxxwqet_grader.grade("Project 8 Assessment", "Task 8.5.19", fit_out)You got it. Dance party time! 🕺💃🕺💃
Score: 1
Task 8.5.20: Create a post request to hit the "/fit" path running at "http://localhost:8008". You should train a GARCH(1,1) model on 2500 observations of the MTN data you already downloaded. Pass in your parameters as a dictionary using the json argument. The grader will evaluate the JSON of your response.
x
# URL of `/fit` pathurl = "http://localhost:8008/fit"# Data to send to pathjson = { "ticker":"MTNOY", "use_new_data":False, "n_observations":2500, "p":1, "q":1}# Response of post requestresponse = requests.post(url=url, json=json)print("response type:", type(response))print("response status code:", response.status_code)response type: <class 'requests.models.Response'> response status code: 200
xxxxxxxxxxsubmission_8520 = response.json()submission_8520{'ticker': 'MTNOY',
'p': 1,
'q': 1,
'n_observations': 2500,
'use_new_data': False,
'success': True,
'message': 'Trained and save models/2023-01-27T06:00:24.703316_MTNOY.pkl. Metrics: AIC 12006.325788440472, BIC 12029.621972483897.'}xxxxxxxxxxwqet_grader.grade("Project 8 Assessment", "Task 8.5.20", submission_8520)--------------------------------------------------------------------------- Exception Traceback (most recent call last) Cell In [104], line 1 ----> 1 wqet_grader.grade("Project 8 Assessment", "Task 8.5.20", submission_8520) File /opt/conda/lib/python3.9/site-packages/wqet_grader/__init__.py:182, in grade(assessment_id, question_id, submission) 177 def grade(assessment_id, question_id, submission): 178 submission_object = { 179 'type': 'simple', 180 'argument': [submission] 181 } --> 182 return show_score(grade_submission(assessment_id, question_id, submission_object)) File /opt/conda/lib/python3.9/site-packages/wqet_grader/transport.py:146, in grade_submission(assessment_id, question_id, submission_object) 144 raise Exception('Grader raised error: {}'.format(error['message'])) 145 else: --> 146 raise Exception('Could not grade submission: {}'.format(error['message'])) 147 result = envelope['data']['result'] 149 # Used only in testing Exception: Could not grade submission: Could not verify access to this assessment: Received error from WQET submission API: Could not find existing program enrollment for user
Task 8.5.21: Create a post request to hit the "/predict" path running at "http://localhost:8008". You should get the 5-day volatility forecast for MTN. When you're satisfied, submit your work to the grader.
xxxxxxxxxx# URL of `/predict` pathurl = ...# Data to send to pathjson = ...# Response of post requestresponse = ...print("response type:", type(response))print("response status code:", response.status_code)xxxxxxxxxxsubmission_8521 = response.json()submission_8521xxxxxxxxxxwqet_grader.grade("Project 8 Assessment", "Task 8.5.21", submission_8521)Copyright 2022 WorldQuant University. This content is licensed solely for personal use. Redistribution or publication of this material is strictly prohibited.
- 085-assignment.ipynb
- 082-test-driven.ipynb
- 083-garch.ipynb
- 084-model-deployment.ipynb
xxxxxxxxxxUsage Guidelines
This lesson is part of the DS Lab core curriculum. For that reason, this notebook can only be used on your WQU virtual machine.
This means:
- ⓧ No downloading this notebook.
- ⓧ No re-sharing of this notebook with friends or colleagues.
- ⓧ No downloading the embedded videos in this notebook.
- ⓧ No re-sharing embedded videos with friends or colleagues.
- ⓧ No adding this notebook to public or private repositories.
- ⓧ No uploading this notebook (or screenshots of it) to other websites, including websites for study resources.
xxxxxxxxxx8.4. Model Deployment
xxxxxxxxxxReady for deployment! Over the last three lessons, we've built all the pieces we need for our application. We have a module for getting and storing our data. We have the code to train our model and clean its predictions. In this lesson, we're going to put all those pieces together and deploy our model with an API that others can use to train their own models and predict volatility. We'll start by creating a model for all the code we created in the last lesson. Then we'll complete our main module, which will hold our FastAPI application with two paths: one for model training and one for prediction. Let's jump in!
xxxxxxxxxx%load_ext autoreload%autoreload 2import osimport sqlite3from glob import globimport joblibimport pandas as pdimport requestsimport wqet_graderfrom arch.univariate.base import ARCHModelResultfrom config import settingsfrom data import SQLRepositoryfrom IPython.display import VimeoVideowqet_grader.init("Project 8 Assessment")VimeoVideo("772219745", h="f3bfda20cd", width=600)xxxxxxxxxxModel Module¶
xxxxxxxxxxWe created a lot of code in the last lesson to building, training, and making predictions with our GARCH(1,1) model. We want this code to be reusable, so let's put it in its own module.
Let's start by instantiating a repository that we'll use for testing our module as we build.
VimeoVideo("772219717", h="8f1afa7919", width=600)xxxxxxxxxxTask 8.4.1: Create a SQLRepository named repo. Be sure that it's attached to a SQLite connection.
xxxxxxxxxxconnection = sqlite3.connect(settings.db_name, check_same_thread = False)repo = SQLRepository(connection=connection)print("repo type:", type(repo))print("repo.connection type:", type(repo.connection))repo type: <class 'data.SQLRepository'> repo.connection type: <class 'sqlite3.Connection'>
xxxxxxxxxxNow that we have the repo ready, we'll shift to our model module and build a GarchModel class to hold all our code from the last lesson.
VimeoVideo("772219669", h="1d225ab776", width=600)xxxxxxxxxxTask 8.4.2: In the model module, create a definition for a GarchModel model class. For now, it should only have an __init__ method. Use the docstring as a guide. When you're done, test your class using the assert statements below.
from model import GarchModel# Instantiate a `GarchModel`gm_ambuja = GarchModel(ticker="AMBUJACEM.BSE", repo=repo, use_new_data=False)# Does `gm_ambuja` have the correct attributes?assert gm_ambuja.ticker == "AMBUJACEM.BSE"assert gm_ambuja.repo == repoassert not gm_ambuja.use_new_dataassert gm_ambuja.model_directory == settings.model_directoryVimeoVideo("772219593", h="3f3c401c04", width=600)xxxxxxxxxxTask 8.4.3: Turn your wrangle_data function from the last lesson into a method for your GarchModel class. When you're done, use the assert statements below to test the method by getting and wrangling data for the department store Shoppers Stop.
# Instantiate `GarchModel`, use new datamodel_shop = GarchModel(ticker="SHOPERSTOP.BSE", repo=repo, use_new_data=True)# Check that model doesn't have `data` attribute yetassert not hasattr(model_shop, "data")# Wrangle datamodel_shop.wrangle_data(n_observations=1000)# Does model now have `data` attribute?assert hasattr(model_shop, "data")# Is the `data` a Series?assert isinstance(model_shop.data, pd.Series)# Is Series correct shape?assert model_shop.data.shape == (1000,)model_shop.data.head()date 2019-01-15 -0.136041 2019-01-16 1.002238 2019-01-17 -2.003854 2019-01-18 -0.471884 2019-01-21 1.363098 Name: return, dtype: float64
VimeoVideo("772219535", h="55fbfdff55", width=600)xxxxxxxxxxTask 8.4.4: Using your code from the previous lesson, create a fit method for your GarchModel class. When you're done, use the code below to test it.
- Write a class method in Python.
- What's an assert statement?
- Write an assert statement in Python.WQU WorldQuant University Applied Data Science Lab QQQQ
xxxxxxxxxx# Instantiate `GarchModel`, use old datamodel_shop = GarchModel(ticker="SHOPERSTOP.BSE", repo=repo, use_new_data=False)# Wrangle datamodel_shop.wrangle_data(n_observations=1000)# Fit GARCH(1,1) model to datamodel_shop.fit(p=1, q=1)# Does `model_shop` have a `model` attribute now?assert hasattr(model_shop, "model")# Is model correct data type?assert isinstance(model_shop.model, ARCHModelResult)# Does model have correct parameters?assert model_shop.model.params.index.tolist() == ["mu", "omega", "alpha[1]", "beta[1]"]# Check model parametersmodel_shop.model.summary()| Dep. Variable: | return | R-squared: | 0.000 |
|---|---|---|---|
| Mean Model: | Constant Mean | Adj. R-squared: | 0.000 |
| Vol Model: | GARCH | Log-Likelihood: | -2428.86 |
| Distribution: | Normal | AIC: | 4865.73 |
| Method: | Maximum Likelihood | BIC: | 4885.36 |
| No. Observations: | 1000 | ||
| Date: | Thu, Jan 26 2023 | Df Residuals: | 999 |
| Time: | 12:12:19 | Df Model: | 1 |
| coef | std err | t | P>|t| | 95.0% Conf. Int. | |
|---|---|---|---|---|---|
| mu | 0.1028 | 7.712e-02 | 1.333 | 0.183 | [-4.836e-02, 0.254] |
| coef | std err | t | P>|t| | 95.0% Conf. Int. | |
|---|---|---|---|---|---|
| omega | 0.1464 | 0.188 | 0.781 | 0.435 | [ -0.221, 0.514] |
| alpha[1] | 0.0372 | 2.485e-02 | 1.497 | 0.134 | [-1.151e-02,8.590e-02] |
| beta[1] | 0.9468 | 4.437e-02 | 21.340 | 4.838e-101 | [ 0.860, 1.034] |
Covariance estimator: robust
xxxxxxxxxxVimeoVideo("772219489", h="3de8abb0e6", width=600)xxxxxxxxxxTask 8.4.5: Using your code from the previous lesson, create a predict_volatility method for your GarchModel class. Your method will need to return predictions as a dictionary, so you'll need to add your clean_prediction function as a helper method. When you're done, test your work using the assert statements below.
xxxxxxxxxx# Generate prediction from `model_shop`prediction = model_shop.predict_volatility(horizon=5)# Is prediction a dictionary?assert isinstance(prediction, dict)# Are keys correct data type?assert all(isinstance(k, str) for k in prediction.keys())# Are values correct data type?assert all(isinstance(v, float) for v in prediction.values())prediction{'2023-01-26T00:00:00': 2.1863071831011722,
'2023-01-27T00:00:00': 2.202192500759229,
'2023-01-30T00:00:00': 2.217711789211658,
'2023-01-31T00:00:00': 2.232876702276906,
'2023-02-01T00:00:00': 2.247698343791981}xxxxxxxxxxThings are looking good! There are two last methods that we need to add to our GarchModel so that we can save a trained model and then load it when we need it. When we learned about saving and loading files in Project 5, we used a context handler. This time, we'll streamline the process using the joblib library. We'll also start writing our filepaths more programmatically using the os library.
xxxxxxxxxxVimeoVideo("772219427", h="0dd5731a0d", width=600)xxxxxxxxxxTask 8.4.6: Create a dump method for your GarchModel class. It should save the model assigned to the model attribute to the folder specified in your configuration settings. Use the docstring as a guide, and then test your work below.
xxxxxxxxxx# Save `model_shop` model, assign filenamefilename = model_shop.dump()# Is `filename` a string?assert isinstance(filename, str)# Does filename include ticker symbol?assert model_shop.ticker in filename# Does file exist?assert os.path.exists(filename)filename'models/2023-01-26T12:27:09.812765_SHOPERSTOP.BSE.pkl'
xxxxxxxxxxVimeoVideo("772219326", h="4e1f9421e4", width=600)xxxxxxxxxxTask 8.4.7: Create a load function below that will take a ticker symbol as input and return a model. When you're done, use the next cell to load the Shoppers Stop model you saved in the previous task.
xxxxxxxxxxdef load(ticker): """Load latest model from model directory. Parameters ---------- ticker : str Ticker symbol for which model was trained. Returns ------- `ARCHModelResult` """ # Create pattern for glob search pattern = os.path.join(settings.model_directory, f"*{ticker}.pkl") # Try to find path of latest model # Handle possible `IndexError` try: model_path = sorted(glob(pattern))[-1] except IndexError: raise Exception(f"No model trained for {ticker} ") # Load model model = joblib.load(model_path) # Return model return modelxxxxxxxxxx# Assign load output to `model`model_shop = load(ticker="SHOPERSTOP.BSE")# Does function return an `ARCHModelResult`assert isinstance(model_shop, ARCHModelResult)# Check model parametersmodel_shop.summary()| Dep. Variable: | return | R-squared: | 0.000 |
|---|---|---|---|
| Mean Model: | Constant Mean | Adj. R-squared: | 0.000 |
| Vol Model: | GARCH | Log-Likelihood: | -2428.86 |
| Distribution: | Normal | AIC: | 4865.73 |
| Method: | Maximum Likelihood | BIC: | 4885.36 |
| No. Observations: | 1000 | ||
| Date: | Thu, Jan 26 2023 | Df Residuals: | 999 |
| Time: | 12:12:19 | Df Model: | 1 |
| coef | std err | t | P>|t| | 95.0% Conf. Int. | |
|---|---|---|---|---|---|
| mu | 0.1028 | 7.712e-02 | 1.333 | 0.183 | [-4.836e-02, 0.254] |
| coef | std err | t | P>|t| | 95.0% Conf. Int. | |
|---|---|---|---|---|---|
| omega | 0.1464 | 0.188 | 0.781 | 0.435 | [ -0.221, 0.514] |
| alpha[1] | 0.0372 | 2.485e-02 | 1.497 | 0.134 | [-1.151e-02,8.590e-02] |
| beta[1] | 0.9468 | 4.437e-02 | 21.340 | 4.838e-101 | [ 0.860, 1.034] |
Covariance estimator: robust
xxxxxxxxxxVimeoVideo("772219392", h="deed99bf85", width=600)xxxxxxxxxxTask 8.4.8: Transform your load function into a method for your GarchModel class. When you're done, test the method using the assert statements below.
xxxxxxxxxxmodel_shop = GarchModel(ticker="SHOPERSTOP.BSE", repo=repo, use_new_data=False)# Check that new `model_shop_test` doesn't have model attachedassert not hasattr(model_shop, "model")# Load modelmodel_shop.load()# Does `model_shop_test` have model attached?assert hasattr(model_shop, "model")model_shop.model.summary()| Dep. Variable: | return | R-squared: | 0.000 |
|---|---|---|---|
| Mean Model: | Constant Mean | Adj. R-squared: | 0.000 |
| Vol Model: | GARCH | Log-Likelihood: | -2428.86 |
| Distribution: | Normal | AIC: | 4865.73 |
| Method: | Maximum Likelihood | BIC: | 4885.36 |
| No. Observations: | 1000 | ||
| Date: | Thu, Jan 26 2023 | Df Residuals: | 999 |
| Time: | 12:12:19 | Df Model: | 1 |
| coef | std err | t | P>|t| | 95.0% Conf. Int. | |
|---|---|---|---|---|---|
| mu | 0.1028 | 7.712e-02 | 1.333 | 0.183 | [-4.836e-02, 0.254] |
| coef | std err | t | P>|t| | 95.0% Conf. Int. | |
|---|---|---|---|---|---|
| omega | 0.1464 | 0.188 | 0.781 | 0.435 | [ -0.221, 0.514] |
| alpha[1] | 0.0372 | 2.485e-02 | 1.497 | 0.134 | [-1.151e-02,8.590e-02] |
| beta[1] | 0.9468 | 4.437e-02 | 21.340 | 4.838e-101 | [ 0.860, 1.034] |
Covariance estimator: robust
xxxxxxxxxxOur model module is done! Now it's time to move on to the "main" course and add the final piece to our application.
xxxxxxxxxxMain Module¶
xxxxxxxxxxSimilar to the interactive applications we made in Projects 6 and 7, our first step here will be to create an app object. This time, instead of being a plotly application, it'll be a FastAPI application.
xxxxxxxxxxVimeoVideo("772219283", h="2cd1d97516", width=600)xxxxxxxxxxTask 8.4.9: In the main module, instantiate a FastAPI application named app.
xxxxxxxxxxIn order for our app to work, we need to run it on a server. In this case, we'll run the server on our virtual machine using the uvicorn library.
xxxxxxxxxxVimeoVideo("772219237", h="5ee74f82db", width=600)xxxxxxxxxxTask 8.4.10: Go to the command line, navigate to the directory for this project, and start your app server by entering the following command.
uvicorn main:app --reload --workers 1 --host localhost --port 8008
xxxxxxxxxxRemember how the AlphaVantage API had a "/query" path that we accessed using a get HTTP request? We're going to build similar paths for our application. Let's start with an MVP example so we can learn how paths work in FastAPI.
xxxxxxxxxxVimeoVideo("772219175", h="6f53c61020", width=600)xxxxxxxxxxTask 8.4.11: Create a "/hello" path for your app that returns a greeting when it receives a get request.
xxxxxxxxxxWe've got our path. Let's perform as get request to see if it works.
xxxxxxxxxxVimeoVideo("772219134", h="09a4b98413", width=600)xxxxxxxxxxTask 8.4.12: Create a get request to hit the "/hello" path running at "http://localhost:8008".
xxxxxxxxxxurl = "http://localhost:8008/hello"response = requests.get(url)print("response code:", response.status_code)response.json()response code: 200
{'message': 'hello world!'}xxxxxxxxxxExcellent! Now let's start building the fun stuff.
xxxxxxxxxx"/fit" Path¶
xxxxxxxxxxOur first path will allow the user to fit a model to stock data when they make a post request to our server. They'll have the choice to use new data from AlphaVantage, or older data that's already in our database. When a user makes a request, they'll receive a response telling them if the operation was successful or whether there was an error.
One thing that's very important when building an API is making sure the user passes the correct parameters into the app. Otherwise, our app could crash! FastAPI works well with the pydantic library, which checks that each request has the correct parameters and data types. It does this by using special data classes that we need to define. Our "/fit" path will take user input and then output a response, so we need two classes: one for input and one for output.
xxxxxxxxxxVimeoVideo("772219078", h="4f016b11e1", width=600)xxxxxxxxxxTask 8.4.13: Create definitions for a FitIn and a FitOut data class. The FitIn class should inherit from the pydantic BaseClass, and the FitOut class should inherit from the FitIn class. Be sure to include type hints.
xxxxxxxxxxWith our data classes defined, let's see how pydantic ensures our that users are supplying the correct input and our application is returning the correct output.
xxxxxxxxxxVimeoVideo("772219008", h="ad1114eb9e", width=600)xxxxxxxxxxTask 8.4.14: Use the code below to experiment with your FitIn and FitOut classes. Under what circumstances does instantiating them throw errors? What class or classes are they instances of?
xxxxxxxxxxfrom main import FitIn, FitOut# Instantiate `FitIn`. Play with parameters.fi = FitIn( ticker="SHOPERSTOP.BSE", use_new_data=True, n_observations=2000, p=1, q=1) print(fi)# Instantiate `FitOut`. Play with parameters.fo = FitOut( ticker="SHOPERSTOP.BSE", use_new_data=True, n_observations=2000, p=1, q=1, success=True, message="Model is ready")print(fo)ticker='SHOPERSTOP.BSE' p=1 q=1 n_observations=2000 use_new_data=True ticker='SHOPERSTOP.BSE' p=1 q=1 n_observations=2000 use_new_data=True success=True message='Model is ready'
xxxxxxxxxxOne cool feature of FastAPI is that it can work in asynchronous scenarios. That's not something we need to learn for this project, but it does mean that we need to instantiate a GarchModel object each time a user makes a request. To make the coding easier for us, let's make a function to handle that process for us.
xxxxxxxxxxVimeoVideo("772218958", h="37744c9d88", width=600)xxxxxxxxxxTask 8.4.15: Create a build_model function in your main module. Use the docstring as a guide, and test your function below.
xxxxxxxxxxfrom main import build_model# Instantiate `GarchModel` with functionmodel_shop = build_model(ticker="SHOPERSTOP.BSE", use_new_data=False)# Is `SQLRepository` attached to `model_shop`?assert isinstance(model_shop.repo, SQLRepository)# Is SQLite database attached to `SQLRepository`assert isinstance(model_shop.repo.connection, sqlite3.Connection)# Is `ticker` attribute correct?assert model_shop.ticker == "SHOPERSTOP.BSE"# Is `use_new_data` attribute correct?assert not model_shop.use_new_datamodel_shop<model.GarchModel at 0x7f69a0e12af0>
xxxxxxxxxxWe've got data classes, we've got a build_model function, and all that's left is to build the "/fit" path. We'll use our "/hello" path as a template, but we'll need to include more features, like error handling.
xxxxxxxxxxVimeoVideo("772218892", h="6779ee3470", width=600)xxxxxxxxxxTask 8.4.16: Create a "/fit" path for your app. It will take a FitIn object as input, and then build a GarchModel using the build_model function. The model will wrangle the needed data, fit to the data, and save the completed model. Finally, it will send a response in the form of a FitOut object. Be sure to handle any errors that may arise.
xxxxxxxxxxLast step! Let's make a post request and see how our app responds.
xxxxxxxxxxVimeoVideo("772218833", h="6d27fb4539", width=600)xxxxxxxxxxTask 8.4.17: Create a post request to hit the "/fit" path running at "http://localhost:8008". You should train a GARCH(1,1) model on 2000 observations of the Shoppers Stop data you already downloaded. Pass in your parameters as a dictionary using the json argument.
x
# URL of `/fit` pathurl = "http://localhost:8008/fit"# Data to send to pathjson = { "ticker":"SHOPERSTOP.BSE", "use_new_data":False, "n_observations":2000, "p":1, "q":1}# Response of post requestresponse = requests.post(url=url, json=json)# Inspect responseprint("response code:", response.status_code)response.json()response code: 200
{'ticker': 'SHOPERSTOP.BSE',
'p': 1,
'q': 1,
'n_observations': 2000,
'use_new_data': False,
'succcess': True,
'message': 'Trained and save models/2023-01-26T13:21:28.306362_SHOPERSTOP.BSE.pkl'}xxxxxxxxxxBoom! Now we can train models using the API we created. Up next: a path for making predictions.
xxxxxxxxxx"/predict" Path¶
xxxxxxxxxxFor our "/predict" path, users will be able to make a post request with the ticker symbol they want a prediction for and the number of days they want to forecast into the future. Our app will return a forecast or, if there's an error, a message explaining the problem.
The setup will be very similar to our "/fit" path. We'll start with data classes for the in- and output.
xxxxxxxxxxVimeoVideo("772218808", h="3a73624069", width=600)xxxxxxxxxxTask 8.4.18: Create definitions for a PredictIn and PredictOut data class. The PredictIn class should inherit from the pydantic BaseModel, and the PredictOut class should inherit from the PredictIn class. Be sure to include type hints. The use the code below to test your classes.
xxxxxxxxxxfrom main import PredictIn, PredictOutpi = PredictIn(ticker="SHOPERSTOP.BSE", n_days=5)print(pi)po = PredictOut( ticker="SHOPERSTOP.BSE", n_days=5, success=True, forecast={}, message="success")print(po)ticker='SHOPERSTOP.BSE' n_days=5
ticker='SHOPERSTOP.BSE' n_days=5 success=True forecast={} message='success'
xxxxxxxxxxUp next, let's create the path. The good news is that we'll be able to reuse our build_model function.
xxxxxxxxxxVimeoVideo("772218740", h="ff06859ece", width=600)xxxxxxxxxxTask 8.4.19: Create a "/predict" path for your app. It will take a PredictIn object as input, build a GarchModel, load the most recent trained model for the given ticker, and generate a dictionary of predictions. It will then return a PredictOut object with the predictions included. Be sure to handle any errors that may arise.
xxxxxxxxxxLast step, let's see what happens when we make a post request...
xxxxxxxxxxVimeoVideo("772218642", h="1da744b9e7", width=600)xxxxxxxxxxTask 8.4.20: Create a post request to hit the "/predict" path running at "http://localhost:8008". You should get the 5-day volatility forecast for Shoppers Stop. When you're satisfied, submit your work to the grader.
xxxxxxxxxx# URL of `/predict` pathurl = "http://localhost:8008/predict"# Data to send to pathjson = { "ticker":"SHOPERSTOP.BSE", "n_days":5,}# Response of post requestresponse = requests.post(url=url, json=json)# Response JSON to be submitted to gradersubmission = response.json()# Inspect JSONsubmission{'ticker': 'SHOPERSTOP.BSE',
'n_days': 5,
'success': True,
'forecast': {'2023-01-26T00:00:00': 1.9746357858800914,
'2023-01-27T00:00:00': 1.988421386318225,
'2023-01-30T00:00:00': 2.0018867616082825,
'2023-01-31T00:00:00': 2.015042027695701,
'2023-02-01T00:00:00': 2.027896831972314},
'message': ' '}xxxxxxxxxxwqet_grader.grade("Project 8 Assessment", "Task 8.4.20", submission)You're making this look easy. 😉
Score: 1
xxxxxxxxxxWe did it! Better said, you did it. You got data from the AlphaVantage API, you stored it in a SQL database, you built and trained a GARCH model to predict volatility, and you created your own API to serve predictions from your model. That's data engineering, data science, and model deployment all in one project. If you haven't already, now's a good time to give yourself a pat on the back. You definitely deserve it.
xxxxxxxxxxCopyright 2022 WorldQuant University. This content is licensed solely for personal use. Redistribution or publication of this material is strictly prohibited.
xxxxxxxxxxUsage Guidelines
This lesson is part of the DS Lab core curriculum. For that reason, this notebook can only be used on your WQU virtual machine.
This means:
- ⓧ No downloading this notebook.
- ⓧ No re-sharing of this notebook with friends or colleagues.
- ⓧ No downloading the embedded videos in this notebook.
- ⓧ No re-sharing embedded videos with friends or colleagues.
- ⓧ No adding this notebook to public or private repositories.
- ⓧ No uploading this notebook (or screenshots of it) to other websites, including websites for study resources.
xxxxxxxxxx<font size="+3"><strong>8.2. Test Driven Development</strong></font>8.2. Test Driven Development
xxxxxxxxxxIn the previous lesson, we learned how to get data from an API. In this lesson, we have two goals. First, we'll take the code we used to access the API and build an `AlphaVantageAPI` class. This will allow us to reuse our code. Second, we'll create a `SQLRepository` class that will help us load our stock data into a SQLite database and then extract it for later use. Additionally, we'll build this code using a technique called **test driven development**, where we'll use `assert` statements to make sure everything is working properly. That way, we'll avoid issues later when we build our application.In the previous lesson, we learned how to get data from an API. In this lesson, we have two goals. First, we'll take the code we used to access the API and build an AlphaVantageAPI class. This will allow us to reuse our code. Second, we'll create a SQLRepository class that will help us load our stock data into a SQLite database and then extract it for later use. Additionally, we'll build this code using a technique called test driven development, where we'll use assert statements to make sure everything is working properly. That way, we'll avoid issues later when we build our application.
xxxxxxxxxx%load_ext autoreload%load_ext sql%autoreload 2import sqlite3import matplotlib.pyplot as pltimport pandas as pdimport wqet_graderfrom config import settingsfrom IPython.display import VimeoVideowqet_grader.init("Project 8 Assessment")The autoreload extension is already loaded. To reload it, use: %reload_ext autoreload The sql extension is already loaded. To reload it, use: %reload_ext sql
VimeoVideo("764766424", h="88dbe3bff8", width=600)xxxxxxxxxx# Building Our Data ModuleBuilding Our Data Module¶
xxxxxxxxxxFor our application, we're going to keep all the classes we use to extract, transform, and load data in a single module that we'll call `data`.For our application, we're going to keep all the classes we use to extract, transform, and load data in a single module that we'll call data.
xxxxxxxxxx## AlphaVantage API ClassAlphaVantage API Class¶
xxxxxxxxxxLet's get started by taking the code we created in the last lesson and incorporating it into a class that will be in charge of getting data from the AlphaVantage API.Let's get started by taking the code we created in the last lesson and incorporating it into a class that will be in charge of getting data from the AlphaVantage API.
VimeoVideo("764766399", h="08b6a61e84", width=600)xxxxxxxxxx**Task 8.2.1:** In the `data` module, create a class definition for `AlphaVantageAPI`. For now, making sure that it has an `__init__` method that attaches your API key as the attribute `__api_key`. Once you're done, import the class below and create an instance of it called `av`.Task 8.2.1: In the data module, create a class definition for AlphaVantageAPI. For now, making sure that it has an __init__ method that attaches your API key as the attribute __api_key. Once you're done, import the class below and create an instance of it called av.
# Import `AlphaVantageAPI`from data import AlphaVantageAPI# Create instance of `AlphaVantageAPI` classav = AlphaVantageAPI()print("av type:", type(av))av type: <class 'data.AlphaVantageAPI'>
xxxxxxxxxxRemember the `get_daily` function we made in the last lesson? Now we're going to turn it into a class method.Remember the get_daily function we made in the last lesson? Now we're going to turn it into a class method.
VimeoVideo("764766380", h="5b4cf7c753", width=600)xxxxxxxxxx**Task 8.2.2:** Create a `get_daily` method for your `AlphaVantageAPI` class. Once you're done, use the cell below to fetch the stock data for the renewable energy company [Suzlon](https://www.suzlon.com/) and assign it to the DataFrame `df_suzlon`.Task 8.2.2: Create a get_daily method for your AlphaVantageAPI class. Once you're done, use the cell below to fetch the stock data for the renewable energy company Suzlon and assign it to the DataFrame df_suzlon.
# Define Suzlon ticker symbolticker = "SUZLON.BSE"# Use your `av` object to get daily datadf_suzlon = av.get_daily(ticker)print("df_suzlon type:", type(df_suzlon))print("df_suzlon shape:", df_suzlon.shape)df_suzlon.head()df_suzlon type: <class 'pandas.core.frame.DataFrame'> df_suzlon shape: (4253, 5)
| open | high | low | close | volume | |
|---|---|---|---|---|---|
| date | |||||
| 2023-01-25 | 9.66 | 9.75 | 9.41 | 9.49 | 23989499.0 |
| 2023-01-24 | 9.80 | 9.84 | 9.61 | 9.65 | 17695141.0 |
| 2023-01-23 | 10.12 | 10.13 | 9.70 | 9.76 | 21734343.0 |
| 2023-01-20 | 9.65 | 10.24 | 9.65 | 10.03 | 46167723.0 |
| 2023-01-19 | 9.70 | 9.79 | 9.63 | 9.68 | 24547124.0 |
xxxxxxxxxxOkay! The next thing we need to do is test our new method to make sure it works the way we want it to. Usually, these sorts of tests are written *before* writing the method, but, in this first case, we'll do it the other way around in order to get a better sense of how assert statements work.Okay! The next thing we need to do is test our new method to make sure it works the way we want it to. Usually, these sorts of tests are written before writing the method, but, in this first case, we'll do it the other way around in order to get a better sense of how assert statements work.
VimeoVideo("764766326", h="3ffc1a1a2f", width=600)xxxxxxxxxx**Task 8.2.3:** Create four assert statements to test the output of your `get_daily` method. Use the comments below as a guide.Task 8.2.3: Create four assert statements to test the output of your get_daily method. Use the comments below as a guide.
# Does `get_daily` return a DataFrame?assert isinstance(df_suzlon, pd.DataFrame)# Does DataFrame have 5 columns?assert df_suzlon.shape[1] == 5# Does DataFrame have a DatetimeIndex?assert isinstance(df_suzlon.index, pd.DatetimeIndex)# Is the index name "date"?assert df_suzlon.index.name == "date"xxxxxxxxxxVimeoVideo("764766298", h="282ced7752", width=600)xxxxxxxxxx**Task 8.2.4:** Create two more tests for the output of your `get_daily` method. Use the comments below as a guide.Task 8.2.4: Create two more tests for the output of your get_daily method. Use the comments below as a guide.
xxxxxxxxxx# Does DataFrame have correct column names?assert df_suzlon.columns.to_list() == ['open', 'high', 'low', 'close', 'volume']# Are columns correct data type?assert all(df_suzlon.dtypes == float)xxxxxxxxxxOkay! Now that our `AlphaVantageAPI` is ready to get data, let's turn our focus to the class we'll need for storing our data in our SQLite database.<span style='color: transparent; font-size:1%'>WQU WorldQuant University Applied Data Science Lab QQQQ</span>Okay! Now that our AlphaVantageAPI is ready to get data, let's turn our focus to the class we'll need for storing our data in our SQLite database.WQU WorldQuant University Applied Data Science Lab QQQQ
xxxxxxxxxx## SQL Repository ClassSQL Repository Class¶
xxxxxxxxxxIt wouldn't be efficient if our application needed to get data from the AlphaVantage API every time we wanted to explore our data or build a model, so we'll need to store our data in a database. Because our data is highly structured (each DataFrame we extract from AlphaVantage is always going to have the same five columns), it makes sense to use a SQL database.It wouldn't be efficient if our application needed to get data from the AlphaVantage API every time we wanted to explore our data or build a model, so we'll need to store our data in a database. Because our data is highly structured (each DataFrame we extract from AlphaVantage is always going to have the same five columns), it makes sense to use a SQL database.
We'll use SQLite for our database. For consistency, this database will always have the same name, which we've stored in our .env file.
xxxxxxxxxxVimeoVideo("764766285", h="7b6487a28d", width=600)xxxxxxxxxx**Task 8.2.5:** Connect to the database whose name is stored in the `.env` file for this project. Be sure to set the `check_same_thread` argument to `False`. Assign the connection to the variable `connection`.Task 8.2.5: Connect to the database whose name is stored in the .env file for this project. Be sure to set the check_same_thread argument to False. Assign the connection to the variable connection.
xxxxxxxxxxconnection = sqlite3.connect(database= settings.db_name, check_same_thread = False)print("connection type:", type(connection))connection type: <class 'sqlite3.Connection'>
xxxxxxxxxxWe've got a connection, and now we need to start building the class that will handle all our transactions with the database. With this class, though, we're going to create our tests *before* writing the class definition.We've got a connection, and now we need to start building the class that will handle all our transactions with the database. With this class, though, we're going to create our tests before writing the class definition.
xxxxxxxxxxVimeoVideo("764766249", h="4359c98af4", width=600)xxxxxxxxxx**Task 8.2.6:** Write two tests for the `SQLRepository` class, using the comments below as a guide.Task 8.2.6: Write two tests for the SQLRepository class, using the comments below as a guide.
xxxxxxxxxx# Import class definitionfrom data import SQLRepository# Create instance of classrepo = SQLRepository(connection =connection)# Does `repo` have a "connection" attribute?assert hasattr(repo, "connection")# Is the "connection" attribute a SQLite `Connection`?assert isinstance(repo.connection, sqlite3.Connection)xxxxxxxxxx<div class="alert alert-info" role="alert">Tip: You won't be able to run this ☝️ code block until you complete the task below. 👇
xxxxxxxxxxVimeoVideo("764766224", h="71655b61c2", width=600)xxxxxxxxxx**Task 8.2.7:** Create a definition for your `SQLRepository` class. For now, just complete the `__init__` method. Once you're done, use the code you wrote in the previous task to test it.Task 8.2.7: Create a definition for your SQLRepository class. For now, just complete the __init__ method. Once you're done, use the code you wrote in the previous task to test it.
xxxxxxxxxxThe next method we need for the `SQLRepository` class is one that allows us to store information. In SQL talk, this is generally referred to as **inserting** tables into the database.The next method we need for the SQLRepository class is one that allows us to store information. In SQL talk, this is generally referred to as inserting tables into the database.
xxxxxxxxxxVimeoVideo("764766175", h="6d2f030425", width=600)xxxxxxxxxx**Task 8.2.8:** Add an `insert_table` method to your `SQLRepository` class. As a guide use the assert statements below and the docstring in the `data` module. When you're done, run the cell below to check your work.Task 8.2.8: Add an insert_table method to your SQLRepository class. As a guide use the assert statements below and the docstring in the data module. When you're done, run the cell below to check your work.
xxxxxxxxxxresponse = repo.insert_table(table_name=ticker, records=df_suzlon, if_exists="replace")# Does your method return a dictionary?assert isinstance(response, dict)# Are the keys of that dictionary correct?assert sorted(list(response.keys())) == ["records_inserted", "transaction_successful"]xxxxxxxxxxIf our method is passing the assert statements, we know it's returning a record of the database transaction, but we still need to check whether the data has actually been added to the database.If our method is passing the assert statements, we know it's returning a record of the database transaction, but we still need to check whether the data has actually been added to the database.
xxxxxxxxxxVimeoVideo("764766150", h="80fc271c75", width=600)xxxxxxxxxx**Task 8.2.9:** Write a SQL query to get the **first five rows** of the table of Suzlon data you just inserted into the database.Task 8.2.9: Write a SQL query to get the first five rows of the table of Suzlon data you just inserted into the database.
xxxxxxxxxx%load_ext sql%sql sqlite:////home/jovyan/work/ds-curriculum/080-volatility-forecasting-in-india/stocks.sqliteThe sql extension is already loaded. To reload it, use: %reload_ext sql
'Connected: @/home/jovyan/work/ds-curriculum/080-volatility-forecasting-in-india/stocks.sqlite'
xxxxxxxxxx%%sqlselect *from 'SUZLON.BSE'limit 5| date | open | high | low | close | volume |
|---|---|---|---|---|---|
| 2023-01-25 00:00:00 | 9.66 | 9.75 | 9.41 | 9.49 | 23989499.0 |
| 2023-01-24 00:00:00 | 9.8 | 9.84 | 9.61 | 9.65 | 17695141.0 |
| 2023-01-23 00:00:00 | 10.12 | 10.13 | 9.7 | 9.76 | 21734343.0 |
| 2023-01-20 00:00:00 | 9.65 | 10.24 | 9.65 | 10.03 | 46167723.0 |
| 2023-01-19 00:00:00 | 9.7 | 9.79 | 9.63 | 9.68 | 24547124.0 |
xxxxxxxxxxWe can get **insert** data into our database, but let's not forget that we need to **read** data from it, too. Reading will be a little more complex than inserting, so let's start by writing code in this notebook before we incorporate it into our `SQLRepository` class.We can get insert data into our database, but let's not forget that we need to read data from it, too. Reading will be a little more complex than inserting, so let's start by writing code in this notebook before we incorporate it into our SQLRepository class.
xxxxxxxxxxVimeoVideo("764766109", h="d04a7a3f9f", width=600)xxxxxxxxxx**Task 8.2.10:** First, write a SQL query to get **all** the Suzlon data. Then use pandas to extract the data from the database and read it into a DataFrame, names `df_suzlon_test`.Task 8.2.10: First, write a SQL query to get all the Suzlon data. Then use pandas to extract the data from the database and read it into a DataFrame, names df_suzlon_test.
xxxxxxxxxxsql = "select * from 'SUZLON.BSE'"df_suzlon_test = pd.read_sql(sql, con=connection, parse_dates=["date"], index_col ="date")print("df_suzlon_test type:", type(df_suzlon_test))print()print(df_suzlon_test.info())df_suzlon_test.head()df_suzlon_test type: <class 'pandas.core.frame.DataFrame'> <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 4253 entries, 2023-01-25 to 2005-10-20 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 open 4253 non-null float64 1 high 4253 non-null float64 2 low 4253 non-null float64 3 close 4253 non-null float64 4 volume 4253 non-null float64 dtypes: float64(5) memory usage: 199.4 KB None
| open | high | low | close | volume | |
|---|---|---|---|---|---|
| date | |||||
| 2023-01-25 | 9.66 | 9.75 | 9.41 | 9.49 | 23989499.0 |
| 2023-01-24 | 9.80 | 9.84 | 9.61 | 9.65 | 17695141.0 |
| 2023-01-23 | 10.12 | 10.13 | 9.70 | 9.76 | 21734343.0 |
| 2023-01-20 | 9.65 | 10.24 | 9.65 | 10.03 | 46167723.0 |
| 2023-01-19 | 9.70 | 9.79 | 9.63 | 9.68 | 24547124.0 |
xxxxxxxxxxNow that we know how to read a table from our database, let's turn our code into a proper function. But since we're doing backwards designs, we need to start with our tests.Now that we know how to read a table from our database, let's turn our code into a proper function. But since we're doing backwards designs, we need to start with our tests.
xxxxxxxxxxVimeoVideo("764772699", h="6d97cff2e8", width=600)xxxxxxxxxx# Assign `read_table` output to `df_suzlon`df_suzlon = repo.read_table(table_name="SUZLON.BSE", limit=2500) # noQA F821# Is `df_suzlon` a DataFrame?assert isinstance(df_suzlon, pd.DataFrame)# Does it have a `DatetimeIndex`?assert isinstance(df_suzlon.index, pd.DatetimeIndex)# Is the index named "date"?assert df_suzlon.index.name == "date"# Does it have 2,500 rows and 5 columns?assert df_suzlon.shape == (2500,5)# Are the column names correct?assert df_suzlon.columns.to_list() == ['open', 'high', 'low', 'close', 'volume']# Are the column data types correct?assert all(df_suzlon.dtypes == float)# Print `df_suzlon` infoprint("df_suzlon shape:", df_suzlon.shape)print()print(df_suzlon.info())df_suzlon.head()df_suzlon shape: (2500, 5) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2500 entries, 2023-01-25 to 2012-12-04 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 open 2500 non-null float64 1 high 2500 non-null float64 2 low 2500 non-null float64 3 close 2500 non-null float64 4 volume 2500 non-null float64 dtypes: float64(5) memory usage: 117.2 KB None
| open | high | low | close | volume | |
|---|---|---|---|---|---|
| date | |||||
| 2023-01-25 | 9.66 | 9.75 | 9.41 | 9.49 | 23989499.0 |
| 2023-01-24 | 9.80 | 9.84 | 9.61 | 9.65 | 17695141.0 |
| 2023-01-23 | 10.12 | 10.13 | 9.70 | 9.76 | 21734343.0 |
| 2023-01-20 | 9.65 | 10.24 | 9.65 | 10.03 | 46167723.0 |
| 2023-01-19 | 9.70 | 9.79 | 9.63 | 9.68 | 24547124.0 |
xxxxxxxxxx<div class="alert alert-info" role="alert">Tip: You won't be able to run this ☝️ code block until you complete the task below. 👇
xxxxxxxxxxVimeoVideo("764772667", h="afbd47543a", width=600)xxxxxxxxxx**Task 8.2.12:** Expand on the code you're written above to complete the `read_table` function below. Use the docstring as a guide.Task 8.2.12: Expand on the code you're written above to complete the read_table function below. Use the docstring as a guide.
xxxxxxxxxx<div class="alert alert-info" role="alert">Tip: Remember that we stored our data sorted descending by date. It'll definitely make our read_table easier to implement!
xxxxxxxxxxdef read_table(table_name, limit=None): """Read table from database. Parameters ---------- table_name : str Name of table in SQLite database. limit : int, None, optional Number of most recent records to retrieve. If `None`, all records are retrieved. By default, `None`. Returns ------- pd.DataFrame Index is DatetimeIndex "date". Columns are 'open', 'high', 'low', 'close', and 'volume'. All columns are numeric. """ # Create SQL query (with optional limit) if limit: sql = f"select * from '{table_name}' limit {limit}" else: sql = f"select * from '{table_name}'" # Retrieve data, read into DataFrame df = pd.read_sql(sql, con=connection, parse_dates=["date"], index_col ="date") # Return DataFrame return dfxxxxxxxxxxVimeoVideo("764772652", h="9f89b8c66e", width=600)xxxxxxxxxx**Task 8.2.13:** Turn the `read_table` function into a method for your `SQLRepository` class.Task 8.2.13: Turn the read_table function into a method for your SQLRepository class.
xxxxxxxxxxVimeoVideo("764772632", h="3e374abcc3", width=600)xxxxxxxxxx**Task 8.2.14:** Return to task <a href="#task-8211">Task 8.2.11</a> and change the code so that you're testing your class method instead of your notebook function.Task 8.2.14: Return to task Task 8.2.11 and change the code so that you're testing your class method instead of your notebook function.
xxxxxxxxxxExcellent! We have everything we need to get data from AlphaVantage, save that data in our database, and access it later on. Now it's time to do a little exploratory analysis to compare the stocks of the two companies we have data for. Excellent! We have everything we need to get data from AlphaVantage, save that data in our database, and access it later on. Now it's time to do a little exploratory analysis to compare the stocks of the two companies we have data for.
xxxxxxxxxx# Comparing Stock ReturnsComparing Stock Returns¶
xxxxxxxxxxWe already have the data for Suzlon Energy in our database, but we need to add the data for Ambuja Cement before we can compare the two stocks.We already have the data for Suzlon Energy in our database, but we need to add the data for Ambuja Cement before we can compare the two stocks.
xxxxxxxxxxVimeoVideo("764772620", h="d635a99b74", width=600)xxxxxxxxxx**Task 8.2.15:** Use the instances of the `AlphaVantageAPI` and `SQLRepository` classes you created in this lesson (`av` and `repo`, respectively) to get the stock data for Ambuja Cement and read it into the database.Task 8.2.15: Use the instances of the AlphaVantageAPI and SQLRepository classes you created in this lesson (av and repo, respectively) to get the stock data for Ambuja Cement and read it into the database.
xxxxxxxxxxticker = "AMBUJACEM.BSE"# Get Ambuja data using `av`ambuja_records = av.get_daily(ticker)# Insert `ambuja_records` database using `repo`response = repo.insert_table( table_name=ticker, records=ambuja_records, if_exists="replace")response{'transaction_successful': True, 'records_inserted': 4452}xxxxxxxxxxLet's take a look at the data to make sure we're getting what we need.Let's take a look at the data to make sure we're getting what we need.
xxxxxxxxxxVimeoVideo("764772601", h="f0be0fbb1a", width=600)xxxxxxxxxx**Task 8.2.16:** Using the `read_table` method you've added to your `SQLRepository`, extract the most recent 2,500 rows of data for Ambuja Cement from the database and assign the result to `df_ambuja`.Task 8.2.16: Using the read_table method you've added to your SQLRepository, extract the most recent 2,500 rows of data for Ambuja Cement from the database and assign the result to df_ambuja.
xxxxxxxxxxticker = "AMBUJACEM.BSE"df_ambuja = repo.read_table(table_name=ticker, limit=2500)print("df_ambuja type:", type(df_ambuja))print("df_ambuja shape:", df_ambuja.shape)df_ambuja.head()df_ambuja type: <class 'pandas.core.frame.DataFrame'> df_ambuja shape: (2500, 5)
| open | high | low | close | volume | |
|---|---|---|---|---|---|
| date | |||||
| 2023-01-24 | 501.20 | 508.55 | 497.55 | 498.55 | 100346.0 |
| 2023-01-23 | 517.40 | 518.45 | 498.55 | 500.90 | 126483.0 |
| 2023-01-20 | 519.05 | 522.70 | 515.30 | 517.20 | 55838.0 |
| 2023-01-19 | 518.50 | 525.40 | 517.30 | 519.00 | 82121.0 |
| 2023-01-18 | 516.65 | 522.00 | 513.00 | 519.90 | 82300.0 |
xxxxxxxxxxWe've spent a lot of time so far looking at this data, but what does it actually represent? It turns out the stock market is a lot like any other market: people buy and sell goods. The prices of those goods can go up or down depending on factors like supply and demand. In the case of a stock market, the goods being sold are stocks (also called equities or securities), which represent an ownership stake in a corporation.We've spent a lot of time so far looking at this data, but what does it actually represent? It turns out the stock market is a lot like any other market: people buy and sell goods. The prices of those goods can go up or down depending on factors like supply and demand. In the case of a stock market, the goods being sold are stocks (also called equities or securities), which represent an ownership stake in a corporation.
During each trading day, the price of a stock will change, so when we're looking at whether a stock might be a good investment, we look at four types of numbers: open, high, low, close, volume. Open is exactly what it sounds like: the selling price of a share when the market opens for the day. Similarly, close is the selling price of a share when the market closes at the end of the day, and high and low are the respective maximum and minimum prices of a share over the course of the day. Volume is the number of shares of a given stock that have been bought and sold that day. Generally speaking, a firm whose shares have seen a high volume of trading will see more price variation of the course of the day than a firm whose shares have been more lightly traded.
Let's visualize how the price of Ambuja Cement changes over the last decade.
xxxxxxxxxxVimeoVideo("764772582", h="c2b9c56782", width=600)xxxxxxxxxx**Task 8.2.17:** Plot the closing price of `df_ambuja`. Be sure to label your axes and include a legend.Task 8.2.17: Plot the closing price of df_ambuja. Be sure to label your axes and include a legend.
xxxxxxxxxxfig, ax = plt.subplots(figsize=(15, 6))# Plot `df_ambuja` closing pricedf_ambuja["close"].plot(ax=ax, label='AMBUJACEM', color="C1")# Label axesplt.xlabel("Date")plt.ylabel("Closing Price")# Add legendplt.legend()<matplotlib.legend.Legend at 0x7fe3c1ef31c0>
xxxxxxxxxxLet's add the closing price of Suzlon to our graph so we can compare the two.Let's add the closing price of Suzlon to our graph so we can compare the two.
xxxxxxxxxxVimeoVideo("764772560", h="cabe95603f", width=600)xxxxxxxxxx**Task 8.2.18:** Create a plot that shows the closing prices of `df_suzlon` and `df_ambuja`. Again, label your axes and include a legend.Task 8.2.18: Create a plot that shows the closing prices of df_suzlon and df_ambuja. Again, label your axes and include a legend.
xxxxxxxxxxfig, ax = plt.subplots(figsize=(15, 6))# Plot `df_suzlon` and `df_ambuja`df_ambuja["close"].plot(ax=ax, label='AMBUJACEM', color="C1")df_suzlon["close"].plot(ax=ax, label='SUZIAN')# Label axesplt.xlabel("Date")plt.ylabel("Closing Price")# Add legendplt.legend()<matplotlib.legend.Legend at 0x7fe3c1f5f3a0>
xxxxxxxxxxLooking at this plot, we might conclude that Ambuja Cement is a "better" stock than Suzlon energy because its price is higher. But price is just one factor that an investor must consider when creating an investment strategy. What is definitely true is that it's hard to do a head-to-head comparison of these two stocks because there's such a large price difference.Looking at this plot, we might conclude that Ambuja Cement is a "better" stock than Suzlon energy because its price is higher. But price is just one factor that an investor must consider when creating an investment strategy. What is definitely true is that it's hard to do a head-to-head comparison of these two stocks because there's such a large price difference.
One way in which investors compare stocks is by looking at their returns instead. A return is the change in value in an investment, represented as a percentage. So let's look at the daily returns for our two stocks.
xxxxxxxxxxVimeoVideo("764772521", h="48fb7816c9", width=600)xxxxxxxxxx**Task 8.2.19:** Add a `"return"` column to `df_ambuja` that shows the percentage change in the `"close"` column from one day to the next.Task 8.2.19: Add a "return" column to df_ambuja that shows the percentage change in the "close" column from one day to the next.
xxxxxxxxxx<div class="alert alert-info" role="alert">Tip: Our two DataFrames are sorted descending by date, but you'll need to make sure they're sorted ascending in order to calculate their returns.
xxxxxxxxxx# Sort DataFrame ascending by datedf_ambuja.sort_index(inplace=True)# Create "return" columndf_ambuja["return"] = df_ambuja["close"].pct_change() * 100print("df_ambuja shape:", df_ambuja.shape)print(df_ambuja.info())df_ambuja.head()df_ambuja shape: (2500, 6) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2500 entries, 2012-12-03 to 2023-01-24 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 open 2500 non-null float64 1 high 2500 non-null float64 2 low 2500 non-null float64 3 close 2500 non-null float64 4 volume 2500 non-null float64 5 return 2499 non-null float64 dtypes: float64(6) memory usage: 136.7 KB None
| open | high | low | close | volume | return | |
|---|---|---|---|---|---|---|
| date | ||||||
| 2012-12-03 | 205.15 | 211.90 | 205.15 | 209.65 | 100947.0 | NaN |
| 2012-12-04 | 210.00 | 211.45 | 204.70 | 205.75 | 129566.0 | -1.860243 |
| 2012-12-05 | 207.00 | 208.30 | 205.50 | 207.15 | 105079.0 | 0.680437 |
| 2012-12-06 | 207.50 | 209.00 | 203.60 | 206.65 | 194948.0 | -0.241371 |
| 2012-12-07 | 206.00 | 209.70 | 205.05 | 206.25 | 101636.0 | -0.193564 |
xxxxxxxxxxVimeoVideo("764772505", h="0d303013a8", width=600)xxxxxxxxxx**Task 8.2.20:** Add a `"return"` column to `df_suzlon`.Task 8.2.20: Add a "return" column to df_suzlon.
xxxxxxxxxx# Sort DataFrame ascending by datedf_suzlon.sort_index(inplace=True)# Create "return" columndf_suzlon["return"] = df_suzlon["close"].pct_change() * 100print("df_suzlon shape:", df_suzlon.shape)print(df_suzlon.info())df_suzlon.head()df_suzlon shape: (2500, 6) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2500 entries, 2012-12-04 to 2023-01-25 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 open 2500 non-null float64 1 high 2500 non-null float64 2 low 2500 non-null float64 3 close 2500 non-null float64 4 volume 2500 non-null float64 5 return 2499 non-null float64 dtypes: float64(6) memory usage: 136.7 KB None
| open | high | low | close | volume | return | |
|---|---|---|---|---|---|---|
| date | ||||||
| 2012-12-04 | 19.55 | 19.60 | 18.65 | 18.80 | 6882221.0 | NaN |
| 2012-12-05 | 18.65 | 19.55 | 18.60 | 18.95 | 7595425.0 | 0.797872 |
| 2012-12-06 | 19.30 | 19.35 | 18.75 | 19.05 | 3557626.0 | 0.527704 |
| 2012-12-07 | 19.05 | 19.35 | 18.60 | 18.70 | 4116932.0 | -1.837270 |
| 2012-12-10 | 18.70 | 19.15 | 18.60 | 18.70 | 2267592.0 | 0.000000 |
xxxxxxxxxxwqet_grader.grade("Project 8 Assessment", "Task 8.2.20", df_suzlon)Good work!
Score: 1
xxxxxxxxxxNow let's plot the returns for our two companies and see how the two compare.Now let's plot the returns for our two companies and see how the two compare.
xxxxxxxxxxVimeoVideo("764772480", h="b8ebd6bd2f", width=600)xxxxxxxxxx**Task 8.2.21:** Plot the returns for `df_suzlon` and `df_ambuja`. Be sure to label your axes and use legend.Task 8.2.21: Plot the returns for df_suzlon and df_ambuja. Be sure to label your axes and use legend.
xxxxxxxxxxfig, ax = plt.subplots(figsize=(15, 6))# Plot `df_suzlon` and `df_ambuja`df_suzlon["return"].plot(ax=ax, label='SUZIAN')df_ambuja["return"].plot(ax=ax, label='AMBUJACEM', color="C1")# Label axesplt.xlabel("Date")plt.ylabel("Return Price")# Add legendplt.legend()<matplotlib.legend.Legend at 0x7fe3c2df7670>
xxxxxxxxxxSuccess! By representing returns as a percentage, we're able to compare two stocks that have very different prices. But what is this visualization telling us? We can see that the returns for Suzlon have a wider spread. We see big gains and big losses. In contrast, the spread for Ambuja is narrower, meaning that the price doesn't fluctuate as much. Success! By representing returns as a percentage, we're able to compare two stocks that have very different prices. But what is this visualization telling us? We can see that the returns for Suzlon have a wider spread. We see big gains and big losses. In contrast, the spread for Ambuja is narrower, meaning that the price doesn't fluctuate as much.
Another name for this day-to-day fluctuation in returns is called volatility, which is another important factor for investors. So in the next lesson, we'll learn more about volatility and then build a time series model to predict it.
xxxxxxxxxx---Copyright 2022 WorldQuant University. This content is licensed solely for personal use. Redistribution or publication of this material is strictly prohibited.
xxxxxxxxxxUsage Guidelines
This lesson is part of the DS Lab core curriculum. For that reason, this notebook can only be used on your WQU virtual machine.
This means:
- ⓧ No downloading this notebook.
- ⓧ No re-sharing of this notebook with friends or colleagues.
- ⓧ No downloading the embedded videos in this notebook.
- ⓧ No re-sharing embedded videos with friends or colleagues.
- ⓧ No adding this notebook to public or private repositories.
- ⓧ No uploading this notebook (or screenshots of it) to other websites, including websites for study resources.
xxxxxxxxxx<font size="+3"><strong>8.3. Predicting Volatility</strong></font>8.3. Predicting Volatility
xxxxxxxxxxIn the last lesson, we learned that one characteristic of stocks that's important to investors is **volatility**. Actually, it's so important that there are several time series models for predicting it. In this lesson, we'll build one such model called **GARCH**. We'll also continue working with assert statements to test our code. In the last lesson, we learned that one characteristic of stocks that's important to investors is volatility. Actually, it's so important that there are several time series models for predicting it. In this lesson, we'll build one such model called GARCH. We'll also continue working with assert statements to test our code.
import sqlite3import matplotlib.pyplot as pltimport numpy as npimport pandas as pdimport wqet_graderfrom arch import arch_modelfrom config import settingsfrom data import SQLRepositoryfrom IPython.display import VimeoVideofrom statsmodels.graphics.tsaplots import plot_acf, plot_pacfwqet_grader.init("Project 8 Assessment")VimeoVideo("770039650", h="c39b4b0c08", width=600)xxxxxxxxxx# Prepare DataPrepare Data¶
As always, the first thing we need to do is connect to our data source.
xxxxxxxxxx## Import Import¶
VimeoVideo("770039537", h="a20af766cc", width=600)xxxxxxxxxx**Task 8.3.1:** Create a connection to your database and then instantiate a `SQLRepository` named `repo` to interact with that database.Task 8.3.1: Create a connection to your database and then instantiate a SQLRepository named repo to interact with that database.
xxxxxxxxxx- [Open a connection to a SQL database using sqlite3.](../%40textbook/10-databases-sql.ipynb#sqlite3)connection = sqlite3.connect(settings.db_name, check_same_thread = False)repo =SQLRepository(connection=connection)print("repo type:", type(repo))print("repo.connection type:", type(repo.connection))repo type: <class 'data.SQLRepository'> repo.connection type: <class 'sqlite3.Connection'>
xxxxxxxxxxNow that we're connected to a database, let's pull out what we need.Now that we're connected to a database, let's pull out what we need.
VimeoVideo("770039513", h="74530cf5b8", width=600)xxxxxxxxxx**Task 8.3.2:** Pull the most recent 2,500 rows of data for Ambuja Cement from your database. Assign the results to the variable `df_ambuja`.Task 8.3.2: Pull the most recent 2,500 rows of data for Ambuja Cement from your database. Assign the results to the variable df_ambuja.
df_ambuja = repo.read_table(table_name = "AMBUJACEM.BSE", limit=2500)print("df_ambuja type:", type(df_ambuja))print("df_ambuja shape:", df_ambuja.shape)df_ambuja.head()df_ambuja type: <class 'pandas.core.frame.DataFrame'> df_ambuja shape: (2500, 5)
| open | high | low | close | volume | |
|---|---|---|---|---|---|
| date | |||||
| 2023-01-24 | 501.20 | 508.55 | 497.55 | 498.55 | 100346.0 |
| 2023-01-23 | 517.40 | 518.45 | 498.55 | 500.90 | 126483.0 |
| 2023-01-20 | 519.05 | 522.70 | 515.30 | 517.20 | 55838.0 |
| 2023-01-19 | 518.50 | 525.40 | 517.30 | 519.00 | 82121.0 |
| 2023-01-18 | 516.65 | 522.00 | 513.00 | 519.90 | 82300.0 |
xxxxxxxxxxTo train our model, the only data we need are the daily returns for `"AMBUJACEM.BSE"`. We learned how to calculate returns in the last lesson, but now let's formalize that process with a wrangle function.To train our model, the only data we need are the daily returns for "AMBUJACEM.BSE". We learned how to calculate returns in the last lesson, but now let's formalize that process with a wrangle function.
VimeoVideo("770039434", h="4fdcd5ffcb", width=600)xxxxxxxxxx**Task 8.3.3:** Create a `wrangle_data` function whose output is the returns for a stock stored in your database. Use the docstring as a guide and the assert statements in the following code block to test your function. Task 8.3.3: Create a wrangle_data function whose output is the returns for a stock stored in your database. Use the docstring as a guide and the assert statements in the following code block to test your function.
def wrangle_data(ticker, n_observations): """Extract table data from database. Calculate returns. Parameters ---------- ticker : str The ticker symbol of the stock (also table name in database). n_observations : int Number of observations to return. Returns ------- pd.Series Name will be `"return"`. There will be no `NaN` values. """ # Get table from database df = repo.read_table(table_name = ticker, limit=n_observations+1) # Sort DataFrame ascending by date df.sort_index(inplace =True) # Create "return" column df["return"] = df["close"].pct_change() * 100 # Return returns return df["return"].dropna()xxxxxxxxxxWhen you run the cell below to test your function, you'll also create a Series `y_ambuja` that we'll use to train our model.When you run the cell below to test your function, you'll also create a Series y_ambuja that we'll use to train our model.
xxxxxxxxxxy_ambuja = wrangle_data(ticker="AMBUJACEM.BSE", n_observations=2500)# Is `y_ambuja` a Series?assert isinstance(y_ambuja, pd.Series)# Are there 2500 observations in the Series?assert len(y_ambuja) == 2500# Is `y_ambuja` name "return"?assert y_ambuja.name == "return"# Does `y_ambuja` have a DatetimeIndex?assert isinstance(y_ambuja.index, pd.DatetimeIndex)# Is index sorted ascending?assert all(y_ambuja.index == y_ambuja.sort_index(ascending=True).index)# Are there no `NaN` values?assert y_ambuja.isnull().sum() == 0y_ambuja.head()date 2012-12-03 0.938854 2012-12-04 -1.860243 2012-12-05 0.680437 2012-12-06 -0.241371 2012-12-07 -0.193564 Name: return, dtype: float64
xxxxxxxxxxGreat work! Now that we've got a wrangle function, let's get the returns for Suzlon Energy, too.Great work! Now that we've got a wrangle function, let's get the returns for Suzlon Energy, too.
xxxxxxxxxxVimeoVideo("770039414", h="8e8317029e", width=600)xxxxxxxxxx**Task 8.3.4:** Use your `wrangle_data` function to get the returns for the 2,500 most recent trading days of Suzlon Energy. Assign the results to `y_suzlon`.Task 8.3.4: Use your wrangle_data function to get the returns for the 2,500 most recent trading days of Suzlon Energy. Assign the results to y_suzlon.
xxxxxxxxxxy_suzlon = wrangle_data(ticker="SUZLON.BSE", n_observations=2500)print("y_suzlon type:", type(y_suzlon))print("y_suzlon shape:", y_suzlon.shape)y_suzlon.head()y_suzlon type: <class 'pandas.core.series.Series'> y_suzlon shape: (2500,)
date 2012-12-04 -3.589744 2012-12-05 0.797872 2012-12-06 0.527704 2012-12-07 -1.837270 2012-12-10 0.000000 Name: return, dtype: float64
xxxxxxxxxx## ExploreExplore¶
xxxxxxxxxxLet's recreate the volatility time series plot we made in the last lesson so that we have a visual aid to talk about what volatility is.Let's recreate the volatility time series plot we made in the last lesson so that we have a visual aid to talk about what volatility is.
xxxxxxxxxxfig, ax = plt.subplots(figsize=(15, 6))# Plot returns for `df_suzlon` and `df_ambuja`y_suzlon.plot(ax=ax, label="SUZLON")y_ambuja.plot(ax=ax, label="AMBUJACEM")# Label axesplt.xlabel("Date")plt.ylabel("Return")# Add legendplt.legend();xxxxxxxxxxThe above plot shows how returns change over time. This may seem like a totally new concept, but if we visualize them without considering time, things will start to look familiar.The above plot shows how returns change over time. This may seem like a totally new concept, but if we visualize them without considering time, things will start to look familiar.
xxxxxxxxxxVimeoVideo("770039370", h="dde163e45b", width=600)xxxxxxxxxx**Task 8.3.5:** Create a histogram `y_ambuja` with 25 bins. Be sure to label the x-axis `"Returns"`, the y-axis `"Frequency [count]"`, and use the title `"Distribution of Ambuja Cement Daily Returns"`.Task 8.3.5: Create a histogram y_ambuja with 25 bins. Be sure to label the x-axis "Returns", the y-axis "Frequency [count]", and use the title "Distribution of Ambuja Cement Daily Returns".
xxxxxxxxxx# Create histogram of `y_ambuja`, 25 binsplt.hist(y_ambuja, bins=25)# Add axis labelsplt.xlabel("Returns")plt.ylabel("Frequency [count]")# Add titleplt.title("Distribution of Ambuja Cement Daily Returns")Text(0.5, 1.0, 'Distribution of Ambuja Cement Daily Returns')
xxxxxxxxxxThis is a familiar shape! It turns out that returns follow an almost normal distribution, centered on `0`. **Volatility** is the measure of the spread of these returns around the mean. In other words, volatility in finance is the same thing at standard deviation in statistics.This is a familiar shape! It turns out that returns follow an almost normal distribution, centered on 0. Volatility is the measure of the spread of these returns around the mean. In other words, volatility in finance is the same thing at standard deviation in statistics.
Let's start by measuring the daily volatility of our two stocks. Since our data frequency is also daily, this will be exactly the same as calculating the standard deviation.
xxxxxxxxxxVimeoVideo("770039332", h="d43d49b8e7", width=600)xxxxxxxxxx**Task 8.3.6:** Calculate daily volatility for Suzlon and Ambuja, assigning them to the variables `suzlon_daily_volatility` and `ambuja_daily_volatility`, respectively.Task 8.3.6: Calculate daily volatility for Suzlon and Ambuja, assigning them to the variables suzlon_daily_volatility and ambuja_daily_volatility, respectively.
xxxxxxxxxxsuzlon_daily_volatility = y_suzlon.std()ambuja_daily_volatility = y_ambuja.std()print("Suzlon Daily Volatility:", suzlon_daily_volatility)print("Ambuja Daily Volatility:", ambuja_daily_volatility)Suzlon Daily Volatility: 3.979808410375838 Ambuja Daily Volatility: 1.8911809353492421
xxxxxxxxxxLooks like Suzlon is more volatile than Ambuja. This reinforces what we saw in our time series plot, where Suzlon returns have a much wider spread.Looks like Suzlon is more volatile than Ambuja. This reinforces what we saw in our time series plot, where Suzlon returns have a much wider spread.
While daily volatility is useful, investors are also interested in volatility over other time periods — like annual volatility. Keep in mind that a year isn't 365 days for a stock market, though. After excluding weekends and holidays, most markets have only 252 trading days.
So how do we go from daily to annual volatility? The same way we calculated the standard deviation for our multi-day experiment in Project 7!
xxxxxxxxxxVimeoVideo("770039290", h="5b8452708a", width=600)xxxxxxxxxx**Task 8.3.7:** Calculate the annual volatility for Suzlon and Ambuja, assigning the results to `suzlon_annual_volatility` and `ambuja_annual_volatility`, respectively.Task 8.3.7: Calculate the annual volatility for Suzlon and Ambuja, assigning the results to suzlon_annual_volatility and ambuja_annual_volatility, respectively.
xxxxxxxxxxsuzlon_annual_volatility = suzlon_daily_volatility* np.sqrt(252)ambuja_annual_volatility = ambuja_daily_volatility * np.sqrt(252)print("Suzlon Annual Volatility:", suzlon_annual_volatility)print("Ambuja Annual Volatility:", ambuja_annual_volatility)Suzlon Annual Volatility: 63.17749991722655 Ambuja Annual Volatility: 30.021566634963698
xxxxxxxxxxAgain, Suzlon has higher volatility than Ambuja. What do you think it means that the annual volatility is larger than daily?Again, Suzlon has higher volatility than Ambuja. What do you think it means that the annual volatility is larger than daily?
xxxxxxxxxxSince we're dealing with time series data, another way to look at volatility is by calculating it using a rolling window. We'll do this the same way we calculated the rolling average for PM 2.5 levels in Project 3. Here, we'll start focusing on Ambuja Cement exclusively.Since we're dealing with time series data, another way to look at volatility is by calculating it using a rolling window. We'll do this the same way we calculated the rolling average for PM 2.5 levels in Project 3. Here, we'll start focusing on Ambuja Cement exclusively.
xxxxxxxxxxVimeoVideo("770039248", h="71064ba910", width=600)xxxxxxxxxx**Task 8.3.8:** Calculate the rolling volatility for `y_ambuja`, using a 50-day window. Assign the result to `ambuja_rolling_50d_volatility`.Task 8.3.8: Calculate the rolling volatility for y_ambuja, using a 50-day window. Assign the result to ambuja_rolling_50d_volatility.
xxxxxxxxxxambuja_rolling_50d_volatility = y_ambuja.rolling(window=50).std().dropna()print("rolling_50d_volatility type:", type(ambuja_rolling_50d_volatility))print("rolling_50d_volatility shape:", ambuja_rolling_50d_volatility.shape)ambuja_rolling_50d_volatility.head()rolling_50d_volatility type: <class 'pandas.core.series.Series'> rolling_50d_volatility shape: (2451,)
date 2013-02-11 1.693489 2013-02-12 1.686803 2013-02-13 1.672411 2013-02-14 1.669380 2013-02-15 1.674610 Name: return, dtype: float64
xxxxxxxxxxThis time, we'll focus on Ambuja Cement.This time, we'll focus on Ambuja Cement.
xxxxxxxxxxVimeoVideo("770039209", h="8250d0a2d4", width=600)xxxxxxxxxx**Task 8.3.9:** Create a time series plot showing the daily returns for Ambuja Cement and the 50-day rolling volatility. Be sure to label your axes and include a legend.Task 8.3.9: Create a time series plot showing the daily returns for Ambuja Cement and the 50-day rolling volatility. Be sure to label your axes and include a legend.
xxxxxxxxxxfig, ax = plt.subplots(figsize=(15, 6))# Plot `y_ambuja`y_ambuja.plot(ax=ax,label="daily_return")# Plot `ambuja_rolling_50d_volatility`ambuja_rolling_50d_volatility.plot(ax=ax, label="ambuja_rolling_50d_volatility", linewidth=3)# Add x-axis labelplt.xlabel("Date")# Add legendplt.legend()<matplotlib.legend.Legend at 0x7f2ed0c0bb50>
xxxxxxxxxxHere we can see that volatility goes up when the returns change drastically — either up or down. For instance, we can see a big increase in volatility in May 2020, when there were several days of large negative returns. We can also see volatility go down in August 2022, when there are only small day-to-day changes in returns.Here we can see that volatility goes up when the returns change drastically — either up or down. For instance, we can see a big increase in volatility in May 2020, when there were several days of large negative returns. We can also see volatility go down in August 2022, when there are only small day-to-day changes in returns.
This plot reveals a problem. We want to use returns to see if high volatility on one day is associated with high volatility on the following day. But high volatility is caused by large changes in returns, which can be either positive or negative. How can we assess negative and positive numbers together without them canceling each other out? One solution is to take the absolute value of the numbers, which is what we do to calculate performance metrics like mean absolute error. The other solution, which is more common in this context, is to square all the values.
xxxxxxxxxxVimeoVideo("770039182", h="1c7ee27846", width=600)xxxxxxxxxx**Task 8.3.10:** Create a time series plot of the squared returns in `y_ambuja`. Don't forget to label your axes.Task 8.3.10: Create a time series plot of the squared returns in y_ambuja. Don't forget to label your axes.
xxxxxxxxxxfig, ax = plt.subplots(figsize=(15, 6))# Plot squared returns(y_ambuja **2).plot(ax=ax);# Add axis labelsxxxxxxxxxxPerfect! Now it's much easier to see that (1) we have periods of high and low volatility, and (2) high volatility days tend to cluster together. This is a perfect situation to use a GARCH model.Perfect! Now it's much easier to see that (1) we have periods of high and low volatility, and (2) high volatility days tend to cluster together. This is a perfect situation to use a GARCH model.
A GARCH model is sort of like the ARMA model we learned about in Lesson 3.4. It has a p parameter handling correlations at prior time steps and a q parameter for dealing with "shock" events. It also uses the notion of lag. To see how many lags we should have in our model, we should create an ACF and PACF plot — but using the squared returns.
xxxxxxxxxxVimeoVideo("770039152", h="74c63d13ac", width=600)xxxxxxxxxx**Task 8.3.11:** Create an ACF plot of squared returns for Ambuja Cement. Be sure to label your x-axis `"Lag [days]"` and your y-axis `"Correlation Coefficient"`.Task 8.3.11: Create an ACF plot of squared returns for Ambuja Cement. Be sure to label your x-axis "Lag [days]" and your y-axis "Correlation Coefficient".
xxxxxxxxxxfig, ax = plt.subplots(figsize=(15, 6))# Create ACF of squared returnsplot_acf(y_ambuja **2, ax=ax);# Add axis labelsxxxxxxxxxxVimeoVideo("770039126", h="4cfbc287d8", width=600)xxxxxxxxxx**Task 8.3.12:** Create a PACF plot of squared returns for Ambuja Cement. Be sure to label your x-axis `"Lag [days]"` and your y-axis `"Correlation Coefficient"`.Task 8.3.12: Create a PACF plot of squared returns for Ambuja Cement. Be sure to label your x-axis "Lag [days]" and your y-axis "Correlation Coefficient".
xxxxxxxxxxfig, ax = plt.subplots(figsize=(15, 6))# Create PACF of squared returnsplot_pacf(y_ambuja **2, ax=ax);# Add axis labelsxxxxxxxxxxIn our PACF, it looks like a lag of 3 would be a good starting point. In our PACF, it looks like a lag of 3 would be a good starting point.
Normally, at this point in the model building process, we would split our data into training and test sets, and then set a baseline. Not this time. This is because our model's input and its output are two different measurements. We'll use returns to train our model, but we want it to predict volatility. If we created a test set, it wouldn't give us the "true values" that we'd need to assess our model's performance. So this time, we'll skip right to iterating.
xxxxxxxxxx## SplitSplit¶
xxxxxxxxxxThe last thing we need to do before building our model is to create a training set. Note that we won't create a test set here. Rather, we'll use all of `y_ambuja` to conduct walk-forward validation after we've built our model. The last thing we need to do before building our model is to create a training set. Note that we won't create a test set here. Rather, we'll use all of y_ambuja to conduct walk-forward validation after we've built our model.
xxxxxxxxxxVimeoVideo("770039107", h="8c9fbe0f4d", width=600)xxxxxxxxxx**Task 8.3.13:** Create a training set `y_ambuja_train` that contains the first 80% of the observations in `y_ambuja`. Task 8.3.13: Create a training set y_ambuja_train that contains the first 80% of the observations in y_ambuja.
xxxxxxxxxxcutoff_test = int(len(y_ambuja) * 0.8)y_ambuja_train = y_ambuja.iloc[:cutoff_test]print("y_ambuja_train type:", type(y_ambuja_train))print("y_ambuja_train shape:", y_ambuja_train.shape)y_ambuja_train.tail()y_ambuja_train type: <class 'pandas.core.series.Series'> y_ambuja_train shape: (2000,)
date 2021-01-12 -0.795854 2021-01-13 -1.138060 2021-01-14 0.717116 2021-01-15 -1.068016 2021-01-18 -3.049242 Name: return, dtype: float64
xxxxxxxxxx# Build ModelBuild Model¶
xxxxxxxxxxJust like we did the last time we built a model like this, we'll begin by iterating.<span style='color: transparent; font-size:1%'>WQU WorldQuant University Applied Data Science Lab QQQQ</span>Just like we did the last time we built a model like this, we'll begin by iterating.WQU WorldQuant University Applied Data Science Lab QQQQ
xxxxxxxxxx## IterateIterate¶
xxxxxxxxxxVimeoVideo("770039693", h="f06bf81928", width=600)xxxxxxxxxxVimeoVideo("770039053", h="beaf7753d4", width=600)xxxxxxxxxx**Task 8.3.14:** Build and fit a GARCH model using the data in `y_ambuja`. Start with `3` as the value for `p` and `q`. Then use the model summary to assess its performance and try other lags.Task 8.3.14: Build and fit a GARCH model using the data in y_ambuja. Start with 3 as the value for p and q. Then use the model summary to assess its performance and try other lags.
xxxxxxxxxx# Build and train modelmodel = arch_model( y_ambuja_train, p=1, q=1, rescale=False).fit(disp=0)print("model type:", type(model))# Show model summarymodel.summary()model type: <class 'arch.univariate.base.ARCHModelResult'>
| Dep. Variable: | return | R-squared: | 0.000 |
|---|---|---|---|
| Mean Model: | Constant Mean | Adj. R-squared: | 0.000 |
| Vol Model: | GARCH | Log-Likelihood: | -4008.57 |
| Distribution: | Normal | AIC: | 8025.13 |
| Method: | Maximum Likelihood | BIC: | 8047.54 |
| No. Observations: | 2000 | ||
| Date: | Thu, Jan 26 2023 | Df Residuals: | 1999 |
| Time: | 10:58:17 | Df Model: | 1 |
| coef | std err | t | P>|t| | 95.0% Conf. Int. | |
|---|---|---|---|---|---|
| mu | 0.0488 | 3.905e-02 | 1.248 | 0.212 | [-2.779e-02, 0.125] |
| coef | std err | t | P>|t| | 95.0% Conf. Int. | |
|---|---|---|---|---|---|
| omega | 0.1785 | 6.948e-02 | 2.570 | 1.018e-02 | [4.235e-02, 0.315] |
| alpha[1] | 0.0705 | 1.844e-02 | 3.821 | 1.327e-04 | [3.432e-02, 0.107] |
| beta[1] | 0.8783 | 3.296e-02 | 26.651 | 1.750e-156 | [ 0.814, 0.943] |
Covariance estimator: robust
xxxxxxxxxx<div class="alert alert-info" role="alert">Tip: You access the AIC and BIC scores programmatically. Every ARCHModelResult has an .aic and a .bic attribute. Try it for yourself: enter model.aic or model.bic
xxxxxxxxxxNow that we've settled on a model, let's visualize its predictions, together with the Ambuja returns.Now that we've settled on a model, let's visualize its predictions, together with the Ambuja returns.
xxxxxxxxxxVimeoVideo("770039014", h="5e41551d9f", width=600)xxxxxxxxxx**Task 8.3.15:** Create a time series plot with the Ambuja returns and the conditional volatility for your `model`. Be sure to include axis labels and add a legend.Task 8.3.15: Create a time series plot with the Ambuja returns and the conditional volatility for your model. Be sure to include axis labels and add a legend.
xxxxxxxxxxfig, ax = plt.subplots(figsize=(15, 6))# Plot `y_ambuja_train`y_ambuja_train.plot(ax=ax, label="Ambuja Daily Return")# Plot conditional volatility * 2(2*model.conditional_volatility).plot(ax=ax, label="2 SD Conditional Volatility", color="C1", linewidth = 3)# Plot conditional volatility * -2(-2*model.conditional_volatility).rename("").plot(ax=ax, color="C1", linewidth = 3)# Add legendplt.legend()<matplotlib.legend.Legend at 0x7f2ed0a02490>
xxxxxxxxxxVisually, our model looks pretty good, but we should examine residuals, just to make sure. In the case of GARCH models, we need to look at the standardized residuals. Visually, our model looks pretty good, but we should examine residuals, just to make sure. In the case of GARCH models, we need to look at the standardized residuals.
xxxxxxxxxxVimeoVideo("770038994", h="2a13ab49a7", width=600)xxxxxxxxxx**Task 8.3.16:** Create a time series plot of the standardized residuals for your `model`. Be sure to include axis labels and a legend.Task 8.3.16: Create a time series plot of the standardized residuals for your model. Be sure to include axis labels and a legend.
xxxxxxxxxxfig, ax = plt.subplots(figsize=(15, 6))# Plot standardized residualsmodel.std_resid.plot(ax=ax)# Add axis labels# Add legend<AxesSubplot:xlabel='date'>
xxxxxxxxxxThese residuals look good: they have a consistent mean and spread over time. Let's check their normality using a histogram. These residuals look good: they have a consistent mean and spread over time. Let's check their normality using a histogram.
xxxxxxxxxxVimeoVideo("770038970", h="f76c8f6529", width=600)xxxxxxxxxx**Task 8.3.17:** Create a histogram with 25 bins of the standardized residuals for your model. Be sure to label your axes and use a title. Task 8.3.17: Create a histogram with 25 bins of the standardized residuals for your model. Be sure to label your axes and use a title.
xxxxxxxxxx# Create histogram of standardized residuals, 25 binsplt.hist(model.std_resid, bins=25)# Add axis labelsplt.xlabel("Standarized Residual")# Add titleText(0.5, 0, 'Standarized Residual')
xxxxxxxxxxOur last visualization will the ACF of standardized residuals. Just like we did with our first ACF, we'll need to square the values here, too. Our last visualization will the ACF of standardized residuals. Just like we did with our first ACF, we'll need to square the values here, too.
xxxxxxxxxxVimeoVideo("770038952", h="c7a3cfe34f", width=600)xxxxxxxxxx**Task 8.3.18:** Create an ACF plot of the square of your standardized residuals. Don't forget axis labels!Task 8.3.18: Create an ACF plot of the square of your standardized residuals. Don't forget axis labels!
xxxxxxxxxxfig, ax = plt.subplots(figsize=(15, 6))# Create ACF of squared, standardized residualsplot_acf(model.std_resid **2, ax=ax)# Add axis labelsxxxxxxxxxxExcellent! Looks like this model is ready for a final evaluation.Excellent! Looks like this model is ready for a final evaluation.
xxxxxxxxxx## EvaluateEvaluate¶
xxxxxxxxxxTo evaluate our model, we'll do walk-forward validation. Before we do, let's take a look at how this model returns its predictions.To evaluate our model, we'll do walk-forward validation. Before we do, let's take a look at how this model returns its predictions.
xxxxxxxxxxVimeoVideo("770038921", h="f74869b8fc", width=600)xxxxxxxxxx**Task 8.3.19:** Create a one-day forecast from your `model` and assign the result to the variable `one_day_forecast`. Task 8.3.19: Create a one-day forecast from your model and assign the result to the variable one_day_forecast.
xxxxxxxxxxone_day_forecast = model.forecast(horizon=1, reindex=False).variance.iloc[0,0] ** 0.5print("one_day_forecast type:", type(one_day_forecast))one_day_forecastone_day_forecast type: <class 'numpy.float64'>
1.8485320751360448
xxxxxxxxxxThere are two things we need to keep in mind here. First, our `model` forecast shows the predicted **variance**, not the **standard deviation** / **volatility**. So we'll need to take the square root of the value. Second, the prediction is in the form of a DataFrame. It has a DatetimeIndex, and the date is the last day for which we have training data. The `"h.1"` column stands for "horizon 1", that is, our model's prediction for the following day. We'll have to keep all this in mind when we reformat this prediction to serve to the end user of our application.There are two things we need to keep in mind here. First, our model forecast shows the predicted variance, not the standard deviation / volatility. So we'll need to take the square root of the value. Second, the prediction is in the form of a DataFrame. It has a DatetimeIndex, and the date is the last day for which we have training data. The "h.1" column stands for "horizon 1", that is, our model's prediction for the following day. We'll have to keep all this in mind when we reformat this prediction to serve to the end user of our application.
xxxxxxxxxxVimeoVideo("770038861", h="10efe8c445", width=600)xxxxxxxxxx**Task 8.3.20:** Complete the code below to do walk-forward validation on your `model`. Then run the following code block to visualize the model's test predictions.Task 8.3.20: Complete the code below to do walk-forward validation on your model. Then run the following code block to visualize the model's test predictions.
xxxxxxxxxx# Create empty list to hold predictionspredictions = []# Calculate size of test data (20%)test_size = int(len(y_ambuja) * 0.2)# Walk forwardfor i in range(test_size): # Create test data y_train = y_ambuja.iloc[: -(test_size - i)] # Train model model = arch_model(y_train,p=1,q=1,rescale=False).fit(disp=0) # Generate next prediction (volatility, not variance) next_pred = model.forecast(horizon=1, reindex=False).variance.iloc[0,0] ** 0.5 # Append prediction to list predictions.append(next_pred)# Create Series from predictions listy_test_wfv = pd.Series(predictions, index=y_ambuja.tail(test_size).index)print("y_test_wfv type:", type(y_test_wfv))print("y_test_wfv shape:", y_test_wfv.shape)y_test_wfv.head()y_test_wfv type: <class 'pandas.core.series.Series'> y_test_wfv shape: (500,)
date 2021-01-19 1.848532 2021-01-20 1.879677 2021-01-21 1.811127 2021-01-22 2.004459 2021-01-25 2.004325 dtype: float64
xxxxxxxxxxfig, ax = plt.subplots(figsize=(15, 6))# Plot returns for test datay_ambuja.tail(test_size).plot(ax=ax, label="Ambuja Return")# Plot volatility predictions * 2(2 * y_test_wfv).plot(ax=ax, c="C1", label="2 SD Predicted Volatility")# Plot volatility predictions * -2(-2 * y_test_wfv).plot(ax=ax, c="C1")# Label axesplt.xlabel("Date")plt.ylabel("Return")# Add legendplt.legend();xxxxxxxxxxThis looks pretty good. Our volatility predictions seem to follow the changes in returns over time. This is especially clear in the low-volatility period in the summer of 2022 and the high-volatility period in fall 2022.This looks pretty good. Our volatility predictions seem to follow the changes in returns over time. This is especially clear in the low-volatility period in the summer of 2022 and the high-volatility period in fall 2022.
One additional step we could do to evaluate how our model performs on the test data would be to plot the ACF of the standardized residuals for only the test set. But you can do that step on your own.
xxxxxxxxxx# Communicate ResultsCommunicate Results¶
xxxxxxxxxxNormally in this section, we create visualizations for a human audience, but our goal for *this* project is to create an API for a *computer* audience. So we'll focus on transforming our model's predictions to JSON format, which is what we'll use to send predictions in our application. Normally in this section, we create visualizations for a human audience, but our goal for this project is to create an API for a computer audience. So we'll focus on transforming our model's predictions to JSON format, which is what we'll use to send predictions in our application.
The first thing we need to do is create a DatetimeIndex for our predictions. Using labels like "h.1", "h.2", etc., won't work. But there are two things we need to keep in mind. First, we can't include dates that are weekends because no trading happens on those days. And we'll need to write our dates using strings that follow the ISO 8601 standard.
xxxxxxxxxxVimeoVideo("770038804", h="8976257596", width=600)xxxxxxxxxx**Task 8.3.21:** Below is a `prediction`, which contains a 5-day forecast from our `model`. Using it as a starting point, create a `prediction_index`. This should be a list with the following 5 dates written in ISO 8601 format.Task 8.3.21: Below is a prediction, which contains a 5-day forecast from our model. Using it as a starting point, create a prediction_index. This should be a list with the following 5 dates written in ISO 8601 format.
xxxxxxxxxx# Generate 5-day volatility forecastprediction = model.forecast(horizon=5, reindex=False).variance ** 0.5print(prediction)# Calculate forecast start datestart = prediction.index[0] + pd.DateOffset(days=1)# Create date rangeprediction_dates = pd.bdate_range(start=start, periods=prediction.shape[1])# Create prediction index labels, ISO 8601 formatprediction_index = [d.isoformat() for d in prediction_dates]print("prediction_index type:", type(prediction_index))print("prediction_index len:", len(prediction_index))prediction_index[:3]h.1 h.2 h.3 h.4 h.5 date 2023-01-23 1.773645 1.780819 1.787626 1.794086 1.800218 prediction_index type: <class 'list'> prediction_index len: 5
['2023-01-24T00:00:00', '2023-01-25T00:00:00', '2023-01-26T00:00:00']
xxxxxxxxxxNow that we know how to create the index, let's create a function to combine the index and predictions, and then return a dictionary where each key is a date and each value is a predicted volatility. Now that we know how to create the index, let's create a function to combine the index and predictions, and then return a dictionary where each key is a date and each value is a predicted volatility.
xxxxxxxxxxVimeoVideo("770039565", h="d419d0a78d", width=600)xxxxxxxxxx**Task 8.3.22:** Create a `clean_prediction` function. It should take a variance prediction DataFrame as input and return a dictionary where each key is a date in ISO 8601 format and each value is the predicted volatility. Use the docstring as a guide and the assert statements to test your function. When you're satisfied with the result, submit it to the grader.Task 8.3.22: Create a clean_prediction function. It should take a variance prediction DataFrame as input and return a dictionary where each key is a date in ISO 8601 format and each value is the predicted volatility. Use the docstring as a guide and the assert statements to test your function. When you're satisfied with the result, submit it to the grader.
xxxxxxxxxxdef clean_prediction(prediction): """Reformat model prediction to JSON. Parameters ---------- prediction : pd.DataFrame Variance from a `ARCHModelForecast` Returns ------- dict Forecast of volatility. Each key is date in ISO 8601 format. Each value is predicted volatility. """ # Calculate forecast start date start = prediction.index[0] + pd.DateOffset(days=1) # Create date range prediction_dates = pd.bdate_range(start=start, periods=prediction.shape[1]) # Create prediction index labels, ISO 8601 format prediction_index = [d.isoformat() for d in prediction_dates] # Extract predictions from DataFrame, get square root data = prediction.values.flatten() ** 0.5 # Combine `data` and `prediction_index` into Series prediction_formated = pd.Series(data, index = prediction_index) # Return Series as dictionary return prediction_formated.to_dict()xxxxxxxxxxprediction = model.forecast(horizon=10, reindex=False).varianceprediction_formatted = clean_prediction(prediction)# Is `prediction_formatted` a dictionary?assert isinstance(prediction_formatted, dict)# Are keys correct data type?assert all(isinstance(k, str) for k in prediction_formatted.keys())# Are values correct data typeassert all(isinstance(v, float) for v in prediction_formatted.values())prediction_formatted{'2023-01-24T00:00:00': 1.7736452768458828,
'2023-01-25T00:00:00': 1.780818750383704,
'2023-01-26T00:00:00': 1.787625557186955,
'2023-01-27T00:00:00': 1.794085824430055,
'2023-01-30T00:00:00': 1.80021842893306,
'2023-01-31T00:00:00': 1.8060410904036288,
'2023-02-01T00:00:00': 1.811570456053118,
'2023-02-02T00:00:00': 1.8168221775650162,
'2023-02-03T00:00:00': 1.8218109812627508,
'2023-02-06T00:00:00': 1.8265507322128498}xxxxxxxxxxwqet_grader.grade("Project 8 Assessment", "Task 8.3.21", prediction_formatted)🥳
Score: 1
xxxxxxxxxxGreat work! We now have several components for our application: classes for getting data from an API, classes for storing it in a database, and code for building our model and cleaning our predictions. The next step is creating a class for our model and paths for application — both of which we'll do in the next lesson.Great work! We now have several components for our application: classes for getting data from an API, classes for storing it in a database, and code for building our model and cleaning our predictions. The next step is creating a class for our model and paths for application — both of which we'll do in the next lesson.
xxxxxxxxxx---Copyright 2022 WorldQuant University. This content is licensed solely for personal use. Redistribution or publication of this material is strictly prohibited.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-
Variables
Callstack
Breakpoints
Source
xxxxxxxxxx